The ecosystem of retrieval-augmented generation (RAG) has taken off in the last couple of years. More and more open-source projects, aimed at helping developers build RAG applications, are now seen across the internet. And why not? RAG is an effective method to augment large language models (LLMs) with an external knowledge source. So we thought, why not share the best GitHub repositories for mastering RAG systems with our readers?
But before we do that, here is a little about RAG and its applications.
RAG pipelines operate in the following way:
As mentioned, we will explore different open-source RAG frameworks and their GitHub repositories here that enable users to easily build RAG systems. The aim is to help developers, students, and tech enthusiasts choose an RAG toolkit that suits their needs and make use of it.
Retrieval-Augmented Generation has quickly emerged as one of the most impactful innovations in the field of AI. As companies place more and more focus on implementing smarter systems with context awareness, mastering it is no longer optional. Companies are utilizing RAG pipelines for chatbots, knowledge assistants, and enterprise automation. This is to ensure that their AI models are utilizing real-time, domain-specific data, rather than relying solely on pre-trained knowledge.
In the age when RAG is being used to automate smarter chatbots, assistants, and enterprise tools, understanding it thoroughly can give you a great competitive edge. Knowing how to build and optimize RAG pipelines can open up countless doors in AI development, data engineering, and automation. This shall ultimately make you more marketable and future-proof your career.

In the quest for that mastery, here are the top GitHub repositories for RAG systems. But before that, a look at how these RAG frameworks actually help.
The Retrieval-Augmented Generation (RAG) framework is an advanced AI architecture developed to improve the capabilities of LLMs by integrating external information into the response generation process. This makes the LLM responses more informed or temporally relevant than the data used when initially constructing the language model. The model can retrieve relevant documents or data from external databases or knowledge repositories (APIs). It can then use it to generate responses based on user inquiries rather than simply relying on the data from the originally trained model.

This enables the model to process questions and develop answers that are also correct, date-sensitive, or relevant to context. Meanwhile, they can also mitigate issues related to knowledge cut-off and hallucination, or incorrect responses to prompts. By connecting to both general and domain-specific knowledge sources, RAG enables an AI system to provide responsible, trustworthy responses.
You can read all about RAG systems here.
Applications of this are across use cases, like customer support, search, compliance, data analytics, and more. RAG systems also eliminate the need to frequently retrain the model or attempt to serve individual user responses through the model being trained.
Now that we know how RAG systems help, let us explore the top GitHub repositories with detailed tutorials, code, and resources for mastering RAG systems. These GitHub repositories will help you master the tools, skills, frameworks, and theories necessary for working with RAG systems.
LangChain is a complete LLM toolkit that enables developers to create sophisticated applications with features such as prompts, memories, agents, and data connectors. From loading documents to splitting text, embedding and retrieval, and generating outputs, LangChain provides modules for each step of a RAG pipeline.
LangChain (know all about it here) boasts a rich ecosystem of integrations with providers such as OpenAI, Hugging Face, Azure, and many others. It also supports several languages, including Python, JavaScript, and TypeScript. LangChain features a step-by-step procedure design, allowing you to mix and match tools, build agent workflows, and use built-in chains.
Usage Example
LangChain’s high-level APIs make simple RAG pipelines concise. For example, here we use LangChain to answer a question using a small set of documents with OpenAI’s embeddings and LLM:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
# Sample documents to index
docs = ["RAG stands for retrieval-augmented generation.", "It combines search and LLMs for better answers."]
# 1. Create embeddings and vector store
vectorstore = FAISS.from_texts(docs, OpenAIEmbeddings())
# 2. Build a QA chain (LLM + retriever)
qa = RetrievalQA.from_chain_type(
llm=OpenAI(model_name="text-davinci-003"),
retriever=vectorstore.as_retriever()
)
# 3. Run the query
result = qa({"query": "What does RAG mean?"})
print(result["result"])
This code takes the docs and loads them into a FAISS vector store using OpenAI embeds. It then uses RetrievalQA to grab the relevant context and generate an answer. LangChain abstracts away the retrieval and LLM call. (For additional instructions, please refer to the LangChain APIs and Tutorials.)
For more, check the Langchain’s GitHub repository here.
Haystack, by deepset, is an RAG framework designed for an enterprise that is built around composable pipelines. The main idea is to have a graph-like pipeline. The one in which you wire together nodes (i.e, components), such as retrievers, readers, and generators, into a directed graph. Haystack is designed for deployment in prod and offers many choices of backends Elasticsearch, OpenSearch, Milvus, Qdrant, and many more, for document storage and retrieval.
Usage Example
Below is a simplified example using Haystack’s modern API (v2) to create a small RAG pipeline:
from haystack.document_stores import InMemoryDocumentStore
from haystack.nodes import BM25Retriever, OpenAIAnswerGenerator
from haystack.pipelines import Pipeline
# 1. Prepare a document store
doc_store = InMemoryDocumentStore()
documents = [{"content": "RAG stands for retrieval-augmented generation."}]
doc_store.write_documents(documents)
# 2. Set up retriever and generator
retriever = BM25Retriever(document_store=doc_store)
generator = OpenAIAnswerGenerator(model_name="text-davinci-003")
# 3. Build the pipeline
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=[])
pipe.add_node(component=generator, name="Generator", inputs=["Retriever"])
# 4. Run the RAG query
result = pipe.run(query="What does RAG mean?")
print(result["answers"][0].answer)
This code writes one doc into an in-memory store, uses BM25 to find relevant text, then asks the OpenAI model to answer. Haystack’s Pipeline orchestrates the flow. For more, check deepset repository here.
Also, check out how to buildan Agentic QA RAG system using Haystack here.
LlamaIndex, formerly known as GPT Index, is a data-centric RAG framework focused on indexing and querying your data for LLM use. Consider LlamaIndex as a set of tools used to build custom indexes over documents (vectors, keyword indexes, graphs) and then query them. LlamaIndex is a powerful way to connect different data sources like text files, APIs, and SQL to LLMs using index structures.
For example, you can create a vector index of all of your files, and then use a built-in query engine to answer any questions you may have, all using LlamaIndex. LlamaIndex supplies high-level APIs and low-level modules to be able to customize every part of the RAG process.
Usage Example
LlamaIndex makes it very easy to create a searchable index from documents. For instance, using the core API:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
# 1. Load documents (all files in the 'data' directory)
documents = SimpleDirectoryReader("./data").load_data()
# 2. Build a vector store index from the docs
index = VectorStoreIndex.from_documents(documents)
# 3. Create a query engine from the index
query_engine = index.as_query_engine()
# 4. Run a query against the index
response = query_engine.query("What does RAG mean?")
print(response)
This code will read files in the ./data directory, index them in memory, and then query the index. LlamaIndex returns the answer as a string. For more, check the Llamindex repository here.
Or, build a RAG pipeline using LlamaIndex. Here is how.
RAGFlow is an RAG engine designed for enterprises from InfiniFlow to accommodate complex and large-scale data. It refers to the goal of “deep document understanding” in order to parse different formats such as PDFs, scanned documents, images, or tables, and summarize them into organized chunks.
RAGFlow features an integrated retrieval model with agent templates and visual tooling for debugging. Key elements are the advanced template-based chunking for the documents and the notion of grounded citations. It helps with reducing hallucinations because you can know which source texts support which answer.
Usage Example
import requests
api_url = "http://localhost:8000/api/v1/chats_openai/default/chat/completions"
api_key = "YOUR_RAGFLOW_API_KEY"
headers = {"Authorization": f"Bearer {api_key}"}
data = {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "What is RAG?"}],
"stream": False
}
response = requests.post(api_url, headers=headers, json=data)
print(response.json()["choices"][0]["message"]["content"])
This example illustrates the chat completion API of RAGFlow, which is compatible with OpenAI. It sends a chat message to the “default” assistant, and the assistant will use the indexed documents as a context. For more, check the repository.
txtai is an all-in-one AI framework that provides semantic search, embeddings, and RAG pipelines. It comes with an embeddable vector-searchable database, stemming from SQLite+FAISS, and utilities that allow you to orchestrate LLM calls. With txtai, once you have created an Embedding index using your text data, you should either join it to an LLM manually in the code or use the built-in RAG helper.
What I really like about txtai is its simplicity: it can run 100% locally (no cloud), it has a template built in for a RAG pipeline, and it even provides autogenerated FastAPI services. It is also open source (Apache 2.0), easy to prototype and deploy.
You can read more about txtai here.
Usage Example
Here’s how simple it is to run a RAG query in txtai using the built-in pipeline:
from txtai import Embeddings, LLM, RAG
# 1. Initialize txtai components
embeddings = Embeddings() # uses a local FAISS+SQLite by default
embeddings.index([{"id": "doc1", "text": "RAG stands for retrieval-augmented generation."}])
llm = LLM("text-davinci-003") # or any model
# 2. Create a RAG pipeline
prompt = "Answer the question using only the context below.\n\nQuestion: {question}\nContext: {context}"
rag = RAG(embeddings, llm, template=prompt)
# 3. Run the RAG query
result = rag("What does RAG mean?", maxlength=512)
print(result["answer"])
This code snippet takes a single document and runs a RAG pipeline. The RAG helper manages the retrieval for relevant passages from the vector index and fill {context} in the prompt template. It will allow you to wrap your RAG pipeline code in a good layer of structure with APIs and no-code UI. Cognita does use LangChain/LlamaIndex modules under the hood, but organizes them with structure: data loaders, parsers, embedders, retrievers, and metric modules. For more, check the repository here.
LLMWare is a complete RAG framework that has a strong deviation towards “smaller” specialized model inference that is secure and faster. Most frameworks use a large cloud LLM. LLMWare runs desktop RAG pipelines with the necessary computing power on a desktop or local server. It limits the risk of data exposure while still utilizing secure LLMs for large-scale pilot studies and various applications.
LLMWare has no-code wizards and templates for the usual RAG functionality, including the functionality of document parsing and indexing. It also has tooling for various document formats (Office and PDF) that are useful first steps for the cognitive AI functionality to document analysis.
Usage Example
LLMWare’s API is designed to be easy. Here’s a basic example based on their docs:
from llmware.prompts import Prompt
from llmware.models import ModelCatalog
# 1. Load a model for prompting
prompter = Prompt().load_model("llmware/bling-tiny-llama-v0")
# 2. (Optionally) index a document to use as context
prompter.add_source_document("./data", "doc.pdf", query="What is RAG?")
# 3. Run the query with context
response = prompter.prompt_with_source("What is RAG?")
print(response)
This code uses an LLMWare Prompt object. We first specify a model (for example, a small Llama model from Hugging Face). We then add a folder that contains source documents. LLMWare parses “doc.pdf” into chunks and filters based on relevance to the user’s question. The prompt_with_source function then makes a request, passing the relevant context from the source. This returns a text answer and metadata response. For more, check the repository here.
Cognita by TrueFoundary is a production-ready RAG framework built for scalability and collaboration. It is primarily about making it easy to go from a notebook or experiment to deployment/service. It supports incremental indexing and has a web UI for non-developers to try uploading documents, picking models, and querying them in real time.
Usage Example
Cognita is primarily accessed through its command-line interface and internal API, but this is a conceptual pseudo snipped using its Python API:
from cognita.pipeline import Pipeline
from cognita.schema import Document
# Initialize a new RAG pipeline
pipeline = Pipeline.create("rag")
# Add documents (with text content)
docs = [Document(id="1", text="RAG stands for retrieval-augmented generation.")]
pipeline.index_documents(docs)
# Query the pipeline
result = pipeline.query("What does RAG mean?")
print(result['answer'])
In a real implementation, you would use YAML to configure Cognita or use its CLI instead to load the data and kick off a service. The previous snippet describes the flow: you create a pipeline, index your data, then ask questions. Cognita documentation has more details. For more, check the complete documentation here. This returns a text answer and metadata response. For more, check the repository here.
These open-source GitHub repositories for RAG systems offer extensive toolkits for developers, researchers, and hobbyists.
All of the GitHub repositories meant for RAG systems above are maintained and come with examples to help you run easily. They collectively demonstrate that RAG is no longer on the cutting edge of academic research, but is now available to everyone who wants to build an AI application. In practice, the “best option” is dependent upon your needs and priorities.