How to Perform RAG using MCP?

Harsh Mishra Last Updated : 05 Jun, 2025
6 min read

Tired of seeing AI giving vague answers when it doesn’t have access to live data? Bored of writing code for performing RAG on local data again and again? These two big problems can be solved easily by integrating RAG with MCP (Model Context Protocol). With MCP, you can connect your AI assistant to external tools and APIs to perform true RAG seamlessly. MCP is a game changer in how AI models communicate with live data. On the other hand, RAG acts as a boon for AI models, providing them with external knowledge that the AI model is unaware of. In this article, we’ll deep dive into the integration of RAG with MCP, what they look like when operating together, and walk you through a working example.

What is RAG?

RAG is an AI framework that combines the strengths of traditional information retrieval systems (such as search and database) with the capabilities of AI models that are outstanding at natural language generation. Its benefits include real-time and factual responses, reduced hallucinations, and context-aware answers. RAG is like asking a librarian about the information before writing a detailed report.

RAG

Learn more about RAG in this article.

What is MCP?

MCP acts as a bridge between your AI assistant and external tools. It’s an open protocol that lets LLMs access real-world tools, APIs, or datasets accurately and efficiently. Traditional APIs and tools require custom code for integrating them with AI models, but MCP provides a generic way to connect tools to LLMs in the simplest way possible. It provides plug-and-play tools.

MCP

Learn more about MCP in this article.

How does it enable RAG?

In RAG, MCP acts as a retrieval layer that retrieves the important chunks of information from your database based on your query. It completely standardized how you interact with your databases. Now, you don’t have to write custom code for every RAG that you are building. It enables dynamic tool use based on the AI’s reasoning.

Steps for Performing RAG with MCP

Now, we are going to implement RAG with MCP in a detailed manner. Follow these steps to create your first MCP server performing RAG. Let’s dive into implementation now:

Firstly, we will set up our RAG MCP server.

Step 1: Installing the dependencies

pip install langchain>=0.1.0 \
           langchain-community>=0.0.5 \
           langchain-groq>=0.0.2 \
           mcp>=1.9.1 \
           chromadb>=0.4.22 \
           huggingface-hub>=0.20.3 \
           transformers>=4.38.0 \
           sentence-transformers>=2.2.2

This step will install all the required libraries in your system.

Step 2: Creating server.py

Now, we are defining the RAG MCP server in the server.py file. Following is the code for it. It contains a simple RAG code with an MCP connection to it. 

from mcp.server.fastmcp import FastMCP
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq  # Groq LLM


# Create an MCP server
mcp = FastMCP("RAG")


# Set up embeddings (You can pick a different Hugging Face model if preferred)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


# Set up Groq LLM
model = ChatGroq(
   model_name="llama3-8b-8192",  # or another Groq-supported model
   groq_api_key="YOUR_GROQ_API"  # Required if not set via environment variable
)


# Load documents
loader = TextLoader("dummy.txt")
data = loader.load()


# Document splitting
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)


# Vector DB
docsearch = Chroma.from_documents(texts, embeddings)


# Retriever chain
qa = RetrievalQA.from_chain_type(llm=model, retriever=docsearch.as_retriever())


@mcp.tool()
def retrieve(prompt: str) -> str:
   """Get information using RAG"""
   return qa.invoke(prompt)


if __name__ == "__main__":
   mcp.run()

Here, we are using the Groq API for accessing LLM. Make sure you have to Groq API. Dummy.txt used here is any data that you have, the contents of which you can change according to your use case.

Now, we have successfully created the RAG MCP server. Now, to check it, run it using Python in the terminal.

python server.py

Step 3: Configuring Cursor for MCP

Let’s configure the Cursor IDE for testing our server.

  1. Download Cursor from the official website https://www.cursor.com/downloads.
  2. Install it, sign up, and get to the home screen.
IDE
  1. Now go to the File from the header toolbar. and click on Preferences and then on Cursor Settings.
Cursor
  1. From the cursor settings, click on MCP.
Cursor Settings
  1. On the MCP tab, click on Add new global MCP Server.
MCP Servers

It will open a mcp.json file. Paste the following code into it and save the file.

Replace  /path/to/python with the path to your Python executable and /path/to/server.py with your server.py path.

{

 "mcpServers": {

   "rag-server": {

     "command": "/path/to/python",

     "args": [

       "path/to/server.py"

     ]

   }

 }

}
  1. Go back to the Cursor Settings, you should see the following:
MCP with RAG

If you see the previous screen, it means your server is running successfully and is connected to the Cursor IDE. If it’s showing some errors, try using the restart button in the top right corner.

We have successfully set up the MCP server in the Cursor IDE. Now, let’s test the server.

Step 4: Testing the MCP Server

Our RAG MCP server can now perform RAG and successfully retrieve the best chunks based on our query. Let’s test them.

Query: “What is Zephyria, Answer using rag-server”

Output:

testing the server

Query: “What was the conflict in the planet?”

Output:

testing the server 2

Query: “What is the capital of Zephyria?”

Output:

testing server 3

Use Cases for RAG with MCP

There are many use cases for RAG with MCP. Some of which are:

  • Search news articles for summarization
  • Query financial APIs for market updates
  • Load private documents for context-aware answers
  • Fetch weather or location-based info before answering
  • Use PDFs or database connectors to power enterprise search

Conclusion

RAG, when powered with MCP, can completely change the way you talk to your AI assistant. It can transform your AI from a simple text generator into a live assistant that thinks and processes information just like a human would. Integrating both can increase your productivity and improve your efficiency over time. With just a few previously mentioned steps, anyone can build AI applications connected to the real world using RAG with MCP. Now it’s time for you to give your LLM superpowers by setting up your own MCP tools.

Frequently Asked Questions

Q1. What is the difference between RAG and traditional LLM responses?

A. Traditional LLMs generate responses based solely on their pre-trained knowledge, which may be outdated or incomplete. RAG enhances this by retrieving real-time or external data (documents, APIs) before answering, ensuring more accurate and up-to-date responses.

Q2. Why should I use MCP for RAG instead of writing custom code?

A. MCP eliminates the need to hardcode every API or database integration manually. It provides a plug-and-play mechanism to expose tools that AI models can dynamically use based on context, making RAG implementation faster, scalable, and more maintainable.

Q3. Do I need to be an expert in AI or LangChain to use RAG with MCP?

A. Not at all. With basic Python knowledge and following the step-by-step setup, you can create your own RAG-powered MCP server. Tools like LangChain and Cursor IDE make the integration straightforward.

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

Login to continue reading and enjoy expert-curated content.

Responses From Readers

Clear