How to Build RAG Systems and AI Agents with Qwen3

Vipin Vashisth Last Updated : 30 Apr, 2025

9 min read

Qwen just released 8 new models as part of its latest family – Qwen3, showcasing promising capabilities. The flagship model, Qwen3-235B-A22B, outperformed most other models including DeepSeek-R1, OpenAI’s o1, o3-mini, Grok 3, and Gemini 2.5-Pro, in standard benchmarks. Meanwhile, the small Qwen3-30B-A3B outperformed QWQ-32B which has approximately 10 times the activated parameters as the new model. With such advanced capabilities, these models prove to be a great choice for a wide range of applications. In this article, we will explore the features of all the Qwen3 models and learn how to use them to build RAG systems and AI agents.

What is Qwen3?
Key Features of Qwen3
How to Access Qwen3 Models via API
Using Qwen3 to Power Your AI Solutions
Applications of Qwen3
Conclusion
Frequently Asked Questions

What is Qwen3?

Qwen3 is the latest series of large language models (LLMs) in the Qwen family, consisting of 8 different models. These include Qwen3-235B-A22B, Qwen3-30B-A3B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B. All these models are released under Apache 2.0 license, making them freely available to individuals, developers, and enterprises.

While 6 of these models are dense, meaning they actively use all the parameters during the time of inference and training, 2 of them are open-weighted:

Qwen3-235B-A22B: A large model with 235 billion parameters, out of which 22 billion are activated parameters.
Qwen3-30B-A3B: A smaller MoE with 30 billion total parameters and 3 billion activated parameters.

Here’s a detailed comparison of all the 8 Qwen3 models:

Models	Layers	Heads (Q/KV)	Tie Embedding	Context Length
Qwen3-0.6B	28	16/8	Yes	32K
Qwen3-1.7B	28	16/8	Yes	32K
Qwen3-4B	36	32/8	Yes	32K
Qwen3-8B	36	32/8	No	128K
Qwen3-14B	40	40/8	No	128K
Qwen3-32B	64	64/8	No	128K
Qwen3-30B-A3B	48	32/4	No	128K
Qwen3-235B-A22B	94	64/4	No	128K

Here’s what the table says:

Layers: Layers represent the number of transformer blocks used. It includes multi-head self-attention mechanism, feed forward networks, positional encoding, layer normalization, and residual connections. So, when I say Qwen3-30B-A3B has 48 layers, it means that the model uses 48 transformer blocks, stacked sequentially or in parallel.
Heads: Transformers use multi-head attention, which splits its attention mechanism into several heads, each for learning a new aspect from the data. Here, Q/KV represents:
- Q (Query heads): Total number of attention heads used for generating queries.
- KV (Key and Value): The number of key/value heads per attention block.

Note: These attention heads for Key, Query, and Value are completely different from the key, query, and value vector generated by a self-attention.

Also Read: Qwen3 Models: How to Access, Performance, Features, and Applications

Key Features of Qwen3

Here are some of the key features of the Qwen3 models:

Pre-training: The pre-training process consists of three stages:
- In the first stage, the model was pretrained on over 30 trillion tokens with a context length of 4k tokens. This taught the model basic language skills and general knowledge.
- In the second stage, the quality of data was improved by increasing the proportion of knowledge-intensive data like STEM, coding, and reasoning tasks. The model was then trained over an additional 5 trillion tokens.
- In the final stage, high quality long context data was used by increasing the context length to 32K tokens. This was done to ensure that the model can handle longer inputs effectively.

Post-training: To develop a hybrid model capable of both step-by-step reasoning and rapid responses, a 4-stage training pipeline was implemented. This consisted of:
- Long chain-of-thoughts(CoT)
- Reasoning-based reinforcement learning (RL)
- Thinking mode fusion
- General RL

Hybrid Thinking Modes: Qwen3 models use a hybrid approach to problem solving, featuring two new modes:
- Thinking Mode: In this mode, models take time by breaking a complex problem statement into small and procedural steps to solve it.
- Non-Thinking Mode: In this mode, the model provides quick results and is mostly suitable for simpler questions.

Multilingual Support: Qwen3 models support 119 languages and dialects. This helps users from all around the world to benefit from these models.
Improvised Agentic Capabilities: Qwen has optimized the Qwen3 models for better coding and agentic capabilities, supporting Model Context Protocol (MCP) as well.

How to Access Qwen3 Models via API

To use the Qwen3 models, we will be accessing it via API using the Openrouter API. Here’s how to do it:

Create an account on Openrouter and go to the model search bar to find the API for that model.

Select the model of your choice and click on ‘Create API key’ on the landing page to generate a new API.

Using Qwen3 to Power Your AI Solutions

In this section, we’ll go through the process of building AI applications using Qwen3. We will first create an AI-powered travel planner agent using the model, and then a Q/A RAG bot using Langchain.

Prerequisites

Before building some real-world AI solutions with Qwen3, we need to first cover the basic prerequisites like:

Familiarity with the command prompt or terminal and the ability to run them through the terminal.
Ability to set up environment variables.
Python must be installed: https://www.python.org/downloads/
Knowledge on the basics of Langchain: https://www.langchain.com/

Building an AI Agent using Qwen3

In this section, we’ll be using Qwen3 to create an AI-powered travel agent that will give the major traveling spots for the city or place you are visiting. We will also enable the agent to search the internet to find updated information, and add a tool that enables currency conversion.

Step 1: Setting up Libraries and Tools

First, we will be installing and importing the necessary libraries and tools required to build the agent.

!pip install langchain langchain-community openai duckduckgo-search
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool
from langchain.tools import DuckDuckGoSearchRun
from langchain.agents import initialize_agent


llm = ChatOpenAI(
   base_url="https://openrouter.ai/api/v1",
   api_key="your_api_key",
   model="qwen/qwen3-235b-a22b:free"
)
# Web Search Tool
search = DuckDuckGoSearchRun()


# Tool for DestinationAgent
def get_destinations(destination):
   return search.run(f"Top 3 tourist spots in {destination}")


DestinationTool = Tool(
   name="Destination Recommender",
   func=get_destinations,
   description="Finds top places to visit in a city"
)


# Tool for CurrencyAgent
def convert_usd_to_inr(query):
   amount = [float(s) for s in query.split() if s.replace('.', '', 1).isdigit()]
   if amount:
       return f"{amount[0]} USD = {amount[0] * 83.2:.2f} INR"
   return "Couldn't parse amount."


CurrencyTool = Tool(
   name="Currency Converter",
   func=convert_usd_to_inr,
   description="Converts USD to inr based on static rate"
)

Search_tool: DuckDuckGoSearchRun() enables the agent to use web search to get real-time information about the popular tourist spots.
DestinationTool: Applies the get_destinations() function, which uses the search tool to get the top 3 tourist spots in any given city.
CurrencyTool: Uses the convert_usd_to_inr() function to convert the prices from USD to INR. You can change ‘inr’ in the function to convert it to a currency of your choice.

Also Read: Build a Travel Assistant Chatbot with HuggingFace, LangChain, and MistralAI

Step 2: Creating the Agent

Now that we have initialized all the tools, let’s proceed to creating an agent that will use the tools and give us a plan for the trip.

tools = [DestinationTool, CurrencyTool]


agent = initialize_agent(
   tools=tools,
   llm=llm,
   agent_type="zero-shot-react-description",
   verbose=True
)
def trip_planner(city, usd_budget):
   dest = get_destinations(city)
   inr_budget = convert_usd_to_inr(f"{usd_budget} USD to INR")
   return f"""Here is your travel plan:


*Top spots in {city}*:
{dest}
*Budget*:
{inr_budget}
Enjoy your day trip!"""

Initialize_agent: This function creates an agent with Langchain using a zero-shot reaction approach, which allows the agent to understand the tool descriptions.
Agent_type: “zero-shot-react-description” enables the agent LLM to decide which tool it should use in a certain situation without prior knowledge, by using the tool description and input.
Verbose: Verbose enables the logging of the agent’s thought process, so we can monitor each decision that the agent makes, including all the interactions and tools invoked.
trip_planner: This is a python function that manually calls tools instead of relying on the agent. It allows the user to select the best tool for a particular problem.

Step 3: Initializing the Agent

In this section, we’ll be initializing the agent and observing its response.

# Initialize the Agent
city = "Delhi"
usd_budget = 8500


# Run the multi-agent planner
response = agent.run(f"Plan a day trip to {city} with a budget of {usd_budget} USD")
from IPython.display import Markdown, display
display(Markdown(response))

Invocation of agent: agent.run() uses the user’s intent via prompt and plans the trip.

Output

Building a RAG System using Qwen3

In this section, we’ll be creating a RAG bot that answers any query within the relevant input document from the knowledge base. This gives an informative response using qwen/qwen3-235b-a22b. The system would also be using Langchain, to produce accurate and context-aware responses.

Step 1: Setting up the Libraries and Tools

First, we will be installing and importing the necessary libraries and tools required to build the RAG system.

!pip install langchain langchain-community langchain-core openai tiktoken chromadb sentence-transformers duckduckgo-search
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
# Load your document
loader = TextLoader("/content/my_docs.txt")
docs = loader.load()

Loading Documents: The “TextLoader” class of Langchain loads the document like a pdf, txt, or doc file which will be used for the Q/A retrieval. Here I’ve uploaded my_docs.txt.
Selecting the Vector Setup: I have used ChromaDB to store and search the embeddings from our vector database for the Q/A process.

Step 2: Creating the Embeddings

Now that we’ve loaded our document, let’s proceed to creating embeddings out of it which will help in easing the retrieval process.

# Split into chunks
splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)


# Embed with HuggingFace model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(chunks, embedding=embeddings)


# Setup Qwen LLM from OpenRouter
llm = ChatOpenAI(
   base_url="https://openrouter.ai/api/v1",
   api_key="YOUR_API_KEY",
   model="qwen/qwen3-235b-a22b:free"
)


# Create RAG chain
retriever = db.as_retriever(search_kwargs={"k": 2})
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

Document Splitting: The CharacterTextSplitter() splits the text into smaller chunks, which will mainly help in two things. First, it eases the retrieval process, and second, it helps in retaining the context from the previous chunk via chunk_overlap.
Embedding Documents: Embeddings convert the text into the embedding vectors of a set dimension for each token. Here we have used chunk_size of 300, which means every word/token will be converted into a vector of 300 dimensions. Now this vector embedding will have all the contextual information of that word with respect to the other words in the chunk.
RAG Chain: RAG chain combines the ChromaDB with the LLM to form a RAG. This enables us to get contextually aware answers from the document as well as from the model.

Step 3: Initializing the RAG System

# Ask a question
response = rag_chain.invoke({"query": "How can i use Qwen with MCP. Please give me a stepwise guide along with the necessary code snippets"})
display(Markdown(response['result']))

Query Execution: The rag_chain_invoke() method will send the user’s query to the RAG system, which then retrieves the relevant context-aware chunks from the document store (vector db) and generates a context-aware answer.

Output

You can find the complete code here.

Applications of Qwen3

Here are some more applications of Qwen3 across industries:

Automated Coding: Qwen3 can generate, debug, and provide documentation for code, which helps developers to solve errors without manual effort. Its 22B parameter model excels in coding, with performances comparable to models like DeepSeek-R1, Gemini 2.5 Pro, and OpenAI’s o3-mini.
Education and Research: Qwen3 archives high accuracy in math, physics, and logical reasoning problem solving. It also rivals the Gemini 2.5 Pro, while excels with models such as OpenAI’s o1, o3-mini, DeepSeek-R1, and Grok 3 Beta.
Agent-Based Tool Integration: Qwen3 also leads in AI agent tasks by allowing the use of external tools, APIs, and MCPs for multi-step and multi-agentic workflows with its tool-calling template, which further simplifies the agentic interaction.
Advanced Reasoning Tasks: Qwen3 uses an extensive thinking capability to deliver optimal and accurate responses. The model uses chain-of-thought reasoning for complex tasks and a non-thinking mode for optimized speed.

Conclusion

In this article, we have learned how to build Qwen3-powered agentic AI and RAG systems. Qwen3’s high performance, multilingual support, and advanced reasoning capability make it a strong choice for knowledge retrieval and agent-based tasks. By integrating Qwen3 into RAG and agentic pipelines, we can get accurate, context-aware, and smooth responses, making it a strong contender for real-world applications for AI-powered systems.

Frequently Asked Questions

Q1. How does Qwen3 differ from other LLMs for RAG?

A. Qwen3 has a hybrid reasoning capability that allows it to make dynamic changes in the responses, which allows it to optimize the RAG workflows for both retrieval and complex analysis.

Q2. What are the tools needed to integrate RAG?

A. It majorly includes the Vector database, Embedding models, Langchain workflow and an API to access the model.

Q3. Can Qwen3 allow the multistep tool chaining in the agent workflow?

Yes, with the Qwen-agent built-in tool calling templates, we can parse and enable sequential tool operations like web searching, data analysis, and report generation.

Q4. How to reduce latency in Qwen3 agent responses?

A. One can reduce the latency in many ways, some of them are:
1. Use of MOE models like Qwen3-30B-A3B, which only have 3 billion active parameters.
2. By using GPU-optimized inferences.

Q5. What are the common errors when implementing Qwen3 agents?

A. The common error includes:
1. MCP server initialization failures, like json formatting and INIT.
2. Tool response pairing errors.
3. Context window overflow.

Vipin Vashisth

Hi, I'm Vipin. I'm passionate about data science and machine learning. I have experience in analyzing data, building models, and solving real-world problems. I aim to use data to create practical solutions and keep learning in the fields of Data Science, Machine Learning, and NLP.

Free Courses

4.8

AWS Data Querying with S3 & Athena

Master AWS data storage & querying with S3, Athena, Glue, RDS, and Redshift.

4.6

Foundations of LangGraph

Build reliable AI workflows using LangGraph state, memory, & agent

4.6

Claude 4.5: Smarter, Faster & More Human AI

Build real-world AI workflow with Claude 4.5 Opus using smart, human-like AI

4.7

NotebookLM Essentials to Pro: The Complete Practical Guide

Your complete NotebookLM guide to faster learning, smarter research, and pow

4.7

Gemini 3: The AI That Thinks, Sees and Creates

Learn Gemini 3 through hands on demos, real apps, and multimodal AI projects

Reading list

How to Build RAG Systems and AI Agents with Qwen3

Table of Contents

What is Qwen3?

Key Features of Qwen3

How to Access Qwen3 Models via API

Using Qwen3 to Power Your AI Solutions

Prerequisites

Building an AI Agent using Qwen3

Step 1: Setting up Libraries and Tools

Step 2: Creating the Agent

Step 3: Initializing the Agent

Output

Building a RAG System using Qwen3

Step 1: Setting up the Libraries and Tools

Step 2: Creating the Embeddings

Step 3: Initializing the RAG System

Output

Applications of Qwen3

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

AWS Data Querying with S3 & Athena

Foundations of LangGraph

Claude 4.5: Smarter, Faster & More Human AI

NotebookLM Essentials to Pro: The Complete Practical Guide

Gemini 3: The AI That Thinks, Sees and Creates

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Build RAG Systems and AI Agents with Qwen3

Table of Contents

What is Qwen3?

Key Features of Qwen3

How to Access Qwen3 Models via API

Using Qwen3 to Power Your AI Solutions

Prerequisites

Building an AI Agent using Qwen3

Step 1: Setting up Libraries and Tools

Step 2: Creating the Agent

Step 3: Initializing the Agent

Output

Building a RAG System using Qwen3

Step 1: Setting up the Libraries and Tools

Step 2: Creating the Embeddings

Step 3: Initializing the RAG System

Output

Applications of Qwen3

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

AWS Data Querying with S3 & Athena

Foundations of LangGraph

Claude 4.5: Smarter, Faster & More Human AI

NotebookLM Essentials to Pro: The Complete Practical Guide

Gemini 3: The AI That Thinks, Sees and Creates

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques