Qwen just released 8 new models as part of its latest family – Qwen3, showcasing promising capabilities. The flagship model, Qwen3-235B-A22B, outperformed most other models including DeepSeek-R1, OpenAI’s o1, o3-mini, Grok 3, and Gemini 2.5-Pro, in standard benchmarks. Meanwhile, the small Qwen3-30B-A3B outperformed QWQ-32B which has approximately 10 times the activated parameters as the new model. With such advanced capabilities, these models prove to be a great choice for a wide range of applications. In this article, we will explore the features of all the Qwen3 models and learn how to use them to build RAG systems and AI agents.
Qwen3 is the latest series of large language models (LLMs) in the Qwen family, consisting of 8 different models. These include Qwen3-235B-A22B, Qwen3-30B-A3B, Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, and Qwen3-0.6B. All these models are released under Apache 2.0 license, making them freely available to individuals, developers, and enterprises.
While 6 of these models are dense, meaning they actively use all the parameters during the time of inference and training, 2 of them are open-weighted:
Here’s a detailed comparison of all the 8 Qwen3 models:
Models | Layers | Heads (Q/KV) | Tie Embedding | Context Length |
Qwen3-0.6B | 28 | 16/8 | Yes | 32K |
Qwen3-1.7B | 28 | 16/8 | Yes | 32K |
Qwen3-4B | 36 | 32/8 | Yes | 32K |
Qwen3-8B | 36 | 32/8 | No | 128K |
Qwen3-14B | 40 | 40/8 | No | 128K |
Qwen3-32B | 64 | 64/8 | No | 128K |
Qwen3-30B-A3B | 48 | 32/4 | No | 128K |
Qwen3-235B-A22B | 94 | 64/4 | No | 128K |
Here’s what the table says:
Note: These attention heads for Key, Query, and Value are completely different from the key, query, and value vector generated by a self-attention.
Also Read: Qwen3 Models: How to Access, Performance, Features, and Applications
Here are some of the key features of the Qwen3 models:
To use the Qwen3 models, we will be accessing it via API using the Openrouter API. Here’s how to do it:
In this section, we’ll go through the process of building AI applications using Qwen3. We will first create an AI-powered travel planner agent using the model, and then a Q/A RAG bot using Langchain.
Before building some real-world AI solutions with Qwen3, we need to first cover the basic prerequisites like:
In this section, we’ll be using Qwen3 to create an AI-powered travel agent that will give the major traveling spots for the city or place you are visiting. We will also enable the agent to search the internet to find updated information, and add a tool that enables currency conversion.
First, we will be installing and importing the necessary libraries and tools required to build the agent.
!pip install langchain langchain-community openai duckduckgo-search
from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool
from langchain.tools import DuckDuckGoSearchRun
from langchain.agents import initialize_agent
llm = ChatOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your_api_key",
model="qwen/qwen3-235b-a22b:free"
)
# Web Search Tool
search = DuckDuckGoSearchRun()
# Tool for DestinationAgent
def get_destinations(destination):
return search.run(f"Top 3 tourist spots in {destination}")
DestinationTool = Tool(
name="Destination Recommender",
func=get_destinations,
description="Finds top places to visit in a city"
)
# Tool for CurrencyAgent
def convert_usd_to_inr(query):
amount = [float(s) for s in query.split() if s.replace('.', '', 1).isdigit()]
if amount:
return f"{amount[0]} USD = {amount[0] * 83.2:.2f} INR"
return "Couldn't parse amount."
CurrencyTool = Tool(
name="Currency Converter",
func=convert_usd_to_inr,
description="Converts USD to inr based on static rate"
)
Also Read: Build a Travel Assistant Chatbot with HuggingFace, LangChain, and MistralAI
Now that we have initialized all the tools, let’s proceed to creating an agent that will use the tools and give us a plan for the trip.
tools = [DestinationTool, CurrencyTool]
agent = initialize_agent(
tools=tools,
llm=llm,
agent_type="zero-shot-react-description",
verbose=True
)
def trip_planner(city, usd_budget):
dest = get_destinations(city)
inr_budget = convert_usd_to_inr(f"{usd_budget} USD to INR")
return f"""Here is your travel plan:
*Top spots in {city}*:
{dest}
*Budget*:
{inr_budget}
Enjoy your day trip!"""
In this section, we’ll be initializing the agent and observing its response.
# Initialize the Agent
city = "Delhi"
usd_budget = 8500
# Run the multi-agent planner
response = agent.run(f"Plan a day trip to {city} with a budget of {usd_budget} USD")
from IPython.display import Markdown, display
display(Markdown(response))
In this section, we’ll be creating a RAG bot that answers any query within the relevant input document from the knowledge base. This gives an informative response using qwen/qwen3-235b-a22b. The system would also be using Langchain, to produce accurate and context-aware responses.
First, we will be installing and importing the necessary libraries and tools required to build the RAG system.
!pip install langchain langchain-community langchain-core openai tiktoken chromadb sentence-transformers duckduckgo-search
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
# Load your document
loader = TextLoader("/content/my_docs.txt")
docs = loader.load()
Now that we’ve loaded our document, let’s proceed to creating embeddings out of it which will help in easing the retrieval process.
# Split into chunks
splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)
# Embed with HuggingFace model
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(chunks, embedding=embeddings)
# Setup Qwen LLM from OpenRouter
llm = ChatOpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_API_KEY",
model="qwen/qwen3-235b-a22b:free"
)
# Create RAG chain
retriever = db.as_retriever(search_kwargs={"k": 2})
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
# Ask a question
response = rag_chain.invoke({"query": "How can i use Qwen with MCP. Please give me a stepwise guide along with the necessary code snippets"})
display(Markdown(response['result']))
You can find the complete code here.
Here are some more applications of Qwen3 across industries:
In this article, we have learned how to build Qwen3-powered agentic AI and RAG systems. Qwen3’s high performance, multilingual support, and advanced reasoning capability make it a strong choice for knowledge retrieval and agent-based tasks. By integrating Qwen3 into RAG and agentic pipelines, we can get accurate, context-aware, and smooth responses, making it a strong contender for real-world applications for AI-powered systems.
A. Qwen3 has a hybrid reasoning capability that allows it to make dynamic changes in the responses, which allows it to optimize the RAG workflows for both retrieval and complex analysis.
A. It majorly includes the Vector database, Embedding models, Langchain workflow and an API to access the model.
Yes, with the Qwen-agent built-in tool calling templates, we can parse and enable sequential tool operations like web searching, data analysis, and report generation.
A. One can reduce the latency in many ways, some of them are:
1. Use of MOE models like Qwen3-30B-A3B, which only have 3 billion active parameters.
2. By using GPU-optimized inferences.
A. The common error includes:
1. MCP server initialization failures, like json formatting and INIT.
2. Tool response pairing errors.
3. Context window overflow.