How to Perform RAG using MCP?

Harsh Mishra Last Updated : 05 Jun, 2025

6 min read

Tired of seeing AI giving vague answers when it doesn’t have access to live data? Bored of writing code for performing RAG on local data again and again? These two big problems can be solved easily by integrating RAG with MCP (Model Context Protocol). With MCP, you can connect your AI assistant to external tools and APIs to perform true RAG seamlessly. MCP is a game changer in how AI models communicate with live data. On the other hand, RAG acts as a boon for AI models, providing them with external knowledge that the AI model is unaware of. In this article, we’ll deep dive into the integration of RAG with MCP, what they look like when operating together, and walk you through a working example.

What is RAG?
What is MCP?
- How does it enable RAG?
Steps for Performing RAG with MCP
Use Cases for RAG with MCP
Conclusion
Frequently Asked Questions

What is RAG?

RAG is an AI framework that combines the strengths of traditional information retrieval systems (such as search and database) with the capabilities of AI models that are outstanding at natural language generation. Its benefits include real-time and factual responses, reduced hallucinations, and context-aware answers. RAG is like asking a librarian about the information before writing a detailed report.

Learn more about RAG in this article.

What is MCP?

MCP acts as a bridge between your AI assistant and external tools. It’s an open protocol that lets LLMs access real-world tools, APIs, or datasets accurately and efficiently. Traditional APIs and tools require custom code for integrating them with AI models, but MCP provides a generic way to connect tools to LLMs in the simplest way possible. It provides plug-and-play tools.

Learn more about MCP in this article.

How does it enable RAG?

In RAG, MCP acts as a retrieval layer that retrieves the important chunks of information from your database based on your query. It completely standardized how you interact with your databases. Now, you don’t have to write custom code for every RAG that you are building. It enables dynamic tool use based on the AI’s reasoning.

Steps for Performing RAG with MCP

Now, we are going to implement RAG with MCP in a detailed manner. Follow these steps to create your first MCP server performing RAG. Let’s dive into implementation now:

Firstly, we will set up our RAG MCP server.

Step 1: Installing the dependencies

pip install langchain>=0.1.0 \
           langchain-community>=0.0.5 \
           langchain-groq>=0.0.2 \
           mcp>=1.9.1 \
           chromadb>=0.4.22 \
           huggingface-hub>=0.20.3 \
           transformers>=4.38.0 \
           sentence-transformers>=2.2.2

This step will install all the required libraries in your system.

Step 2: Creating server.py

Now, we are defining the RAG MCP server in the server.py file. Following is the code for it. It contains a simple RAG code with an MCP connection to it.

from mcp.server.fastmcp import FastMCP
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_groq import ChatGroq  # Groq LLM


# Create an MCP server
mcp = FastMCP("RAG")


# Set up embeddings (You can pick a different Hugging Face model if preferred)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


# Set up Groq LLM
model = ChatGroq(
   model_name="llama3-8b-8192",  # or another Groq-supported model
   groq_api_key="YOUR_GROQ_API"  # Required if not set via environment variable
)


# Load documents
loader = TextLoader("dummy.txt")
data = loader.load()


# Document splitting
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)


# Vector DB
docsearch = Chroma.from_documents(texts, embeddings)


# Retriever chain
qa = RetrievalQA.from_chain_type(llm=model, retriever=docsearch.as_retriever())


@mcp.tool()
def retrieve(prompt: str) -> str:
   """Get information using RAG"""
   return qa.invoke(prompt)


if __name__ == "__main__":
   mcp.run()

Here, we are using the Groq API for accessing LLM. Make sure you have to Groq API. Dummy.txt used here is any data that you have, the contents of which you can change according to your use case.

Now, we have successfully created the RAG MCP server. Now, to check it, run it using Python in the terminal.

python server.py

Step 3: Configuring Cursor for MCP

Let’s configure the Cursor IDE for testing our server.

Download Cursor from the official website https://www.cursor.com/downloads.
Install it, sign up, and get to the home screen.

Now go to the File from the header toolbar. and click on Preferences and then on Cursor Settings.

From the cursor settings, click on MCP.

On the MCP tab, click on Add new global MCP Server.

It will open a mcp.json file. Paste the following code into it and save the file.

Replace /path/to/python with the path to your Python executable and /path/to/server.py with your server.py path.

{

 "mcpServers": {

   "rag-server": {

     "command": "/path/to/python",

     "args": [

       "path/to/server.py"

     ]

   }

 }

}

Go back to the Cursor Settings, you should see the following:

If you see the previous screen, it means your server is running successfully and is connected to the Cursor IDE. If it’s showing some errors, try using the restart button in the top right corner.

We have successfully set up the MCP server in the Cursor IDE. Now, let’s test the server.

Step 4: Testing the MCP Server

Our RAG MCP server can now perform RAG and successfully retrieve the best chunks based on our query. Let’s test them.

Query: “What is Zephyria, Answer using rag-server”

Output:

Query: “What was the conflict in the planet?”

Output:

Query: “What is the capital of Zephyria?”

Output:

Use Cases for RAG with MCP

There are many use cases for RAG with MCP. Some of which are:

Search news articles for summarization
Query financial APIs for market updates
Load private documents for context-aware answers
Fetch weather or location-based info before answering
Use PDFs or database connectors to power enterprise search

Conclusion

RAG, when powered with MCP, can completely change the way you talk to your AI assistant. It can transform your AI from a simple text generator into a live assistant that thinks and processes information just like a human would. Integrating both can increase your productivity and improve your efficiency over time. With just a few previously mentioned steps, anyone can build AI applications connected to the real world using RAG with MCP. Now it’s time for you to give your LLM superpowers by setting up your own MCP tools.

Frequently Asked Questions

Q1. What is the difference between RAG and traditional LLM responses?

A. Traditional LLMs generate responses based solely on their pre-trained knowledge, which may be outdated or incomplete. RAG enhances this by retrieving real-time or external data (documents, APIs) before answering, ensuring more accurate and up-to-date responses.

Q2. Why should I use MCP for RAG instead of writing custom code?

A. MCP eliminates the need to hardcode every API or database integration manually. It provides a plug-and-play mechanism to expose tools that AI models can dynamically use based on context, making RAG implementation faster, scalable, and more maintainable.

Q3. Do I need to be an expert in AI or LangChain to use RAG with MCP?

A. Not at all. With basic Python knowledge and following the step-by-step setup, you can create your own RAG-powered MCP server. Tools like LangChain and Cursor IDE make the integration straightforward.

Harsh Mishra

Harsh Mishra is an AI/ML Engineer who spends more time talking to Large Language Models than actual humans. Passionate about GenAI, NLP, and making machines smarter (so they don’t replace him just yet). When not optimizing models, he’s probably optimizing his coffee intake. 🚀☕

AI Agents Intermediate RAG

Free Courses

FastAPI for AI Engineers: The Complete Guide to Building Scalable AI APIs

Build scalable AI APIs using FastAPI and GenAI workflows.

Agentic AI Masterclass: Building Multi-Agent Systems with AutoGen, LangGraph & CrewAI

Build multi-agent systems using AutoGen, LangGraph, CrewAI.

Graph RAG: Build Knowledge Graph Powered Retrieval Systems

Build Graph RAG systems using knowledge graphs.

4.5

LangChain Fundamentals

Learn LangChain fundamentals, LCEL, and LangGraph to build LLM apps.

4.6

Building and Evaluating RAG System

Learn to build RAG system applications, create AI agents, and deploy.

Reading list

How to Perform RAG using MCP?

Table of contents

What is RAG?

What is MCP?

How does it enable RAG?

Steps for Performing RAG with MCP

Step 1: Installing the dependencies

Step 2: Creating server.py

Step 3: Configuring Cursor for MCP

Step 4: Testing the MCP Server

Use Cases for RAG with MCP

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

FastAPI for AI Engineers: The Complete Guide to Building Scalable AI APIs

Agentic AI Masterclass: Building Multi-Agent Systems with AutoGen, LangGraph & CrewAI

Graph RAG: Build Knowledge Graph Powered Retrieval Systems

LangChain Fundamentals

Building and Evaluating RAG System

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

How to Perform RAG using MCP?

Table of contents

What is RAG?

What is MCP?

How does it enable RAG?

Steps for Performing RAG with MCP

Step 1: Installing the dependencies

Step 2: Creating server.py

Step 3: Configuring Cursor for MCP

Step 4: Testing the MCP Server

Use Cases for RAG with MCP

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

FastAPI for AI Engineers: The Complete Guide to Building Scalable AI APIs

Agentic AI Masterclass: Building Multi-Agent Systems with AutoGen, LangGraph & CrewAI

Graph RAG: Build Knowledge Graph Powered Retrieval Systems

LangChain Fundamentals

Building and Evaluating RAG System

Recommended Articles

Responses From Readers

Become an Author

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques