Advanced RAG Technique : Langchain ReAct and Cohere

Ritika Gupta 06 May, 2024 • 9 min read

Introduction

This article explores Adaptive Question-Answering (QA) frameworks, specifically the Adaptive RAG strategy. It discusses how this framework dynamically selects the most suitable method for large language models (LLMs) based on query complexity. It highlights the learning objectives, features, and implementation of Adaptive RAG, its efficiency, and its integration with Langchain and Cohere LLM. The article also discusses the ReAct Agent’s role in classifying queries and directing them to appropriate tools. It concludes that Adaptive RAG can revolutionize QA systems.

An Adaptive Question-Answering(QA) framework system designed to select the best method for (retrieval-augmented) large language models (LLMs), ranging from basic to sophisticated, based on query complexity. This QA framework strategy was introduced as Adaptive RAG in this paper.

Learning Objectives

  • Understand the concept and implementation of an Adaptive Question-Answering (QA) framework.
  • Learn to deploy Langchain and Cohere LLM for dynamic response selection based on query complexity.
  • Explore various applications of Adaptive RAG in real-world scenarios.
  • Gain insights into the features and benefits of Adaptive RAG for enhancing QA system efficiency.
  • Implement a simple Adaptive RAG architecture using Langchain Agent and Cohere LLM.
  • Familiarize yourself with the ReAct prompting strategy for improved decision-making in LLMs.

This article was published as a part of the Data Science Blogathon.

What is Adaptive RAG?

Adaptive-RAG presents a dynamic QA framework that may change its response method dependent on the query complexity. Adaptive-RAG selects the most appropriate strategy, whether it is iterative and single-step retrieval-augmented procedures or completely bypassing retrieval. 

As a result, this paper proposes an adaptive QA framework aimed to select the best appropriate technique for (retrieval-augmented) large language models, ranging from simple to sophisticated, based on query complexity. This is done through the use of a classifier, which is a smaller LM trained to predict query complexity levels based on automatically acquired labels from real model predictions and underlying dataset patterns. This methodology enables a flexible strategy that easily transitions between iterative and single-step retrieval-augmented LLMs, as well as non-retrieval approaches, to address a wide range of queries.

 Adaptive RAG Source - Paper

In above diagram we can observe a conceptual comparison on different retrieval -augmented LLM approaches to question answering. The single-step approach may not be sufficient for complex queries which require multi-step reasoning. Similarly multi-step approach which iteratively retrieves documents and generates intermediate answers may not be accurate for simple queries.  Adaptive approach can select the most suitable strategy based on query complexity determined by the classifier. 

Features of Adaptive RAG

  • Enhances overall efficiency and accuracy of Question and Answering systems.
  • Utilizes a classifier trained to predict query complexity.
  • Achieves a balance between sophisticated and simpler strategies.

What is ReAct?

In this implementation we use the simple architecture depicted in the flowchart. The ReAct Agent of LangChain will act as a classifier in context of Adaptive RAG here. It will analyse the query and determine the query type so as to route to correct tool or option.

 ReAct

ReAct Framework Prompting

ReAct (Reasoning + Acting) is a prompting strategy created by Princeton University academics in partnership with Google researchers.  It intends to enable LLMs to simulate human-like activities in the actual world, where humans reason vocally and execute actions to get knowledge.  It enables LLMs to interface with external tools, hence improving decision-making processes. LLMs may use React to interpret and create text, make educated judgements, and take action based on what they understand.  

How ReActworks ?

ReAct combines reasoning and acting to solve complex language reasoning and decision-making tasks.

While Chain-of-thought (CoT) prompting works with reasoning steps only which relies heavily on internal knowledge of LLM which makes it prone to fact hallucination. ReAct addresses this by allowing LLMs to generate verbal reasoning traces and actions for a task.

This interaction is achieved through text actions that the model can use to ask questions or perform tasks to gain more information and better understand a situation. For instance, when faced with a multi-hop reasoning question, ReAct might initiate multiple search actions, each potentially being a call to an external tool.

The results of these actions are then used to generate a final answer.

By forcing the LLM to alternate between thinking and acting, ReAct converts it into an active agent in its surroundings, capable of completing tasks in a human-like fashion.

 ReAct Framework for Prompting

When to Use ReAct Prompting?

ReAct is ideal for scenarios where LLM has to rely on external tools and agent and have to interact with them to fetch information for various reasoning steps.

  • One of ReAct’s key features is the ability to combine LLMs with other tools for real-world applications. For example, Microsoft has implemented OpenAI LLMs into its Office apps in Microsoft 365 Copilot, demonstrating their value.
  • In scenarios like QA systems, where LLMs might not always provide correct answers, the interaction with an external search engine becomes crucial, and ReAct proves invaluable.

Important Components Used

Let us now look important component used:

LLM Model

Cohere’s Command R  is a scalable generative model targeting RAG and Tool Use to enable production-scale AI for enterprise. 

  • Strong accuracy on RAG and Tool Use.
  • Low latency, and high throughput.
  • Strong capabilities across 10 key languages.
  • Longer 128k context and lower pricing.
  • Model weights available on Hugging Face for research and evaluation.

Vector DB

We require a vector store for RAG. In our implementation we have used Chroma DB  which is a popular open-source vector store for storing and indexing embeddings . It is available as a LangChain integration.

Web Search API

In the web search tool we will require an internet search API instead of using the conventional Duck Duck Go search API we will use a specialized search API Tavily AI . It is a search engine optimized for LLMs and RAG, aimed at efficient, quick and persistent search results. 

Orchestration Framework

Orchestration tools in the context of LLM applications are software frameworks designed to streamline and manage complex processes involving multiple components and interactions with LLMs. As we all know for building LLM chatbots and applications we require a framework to handle the glue code and allow us to focus on higher level logic. Lang Chain  is the most popular framework and we will use it to build the ReAct Agent which will be our classifier for questions.

Implementing Simple Adaptive RAG using Langchain Agent and Cohere LLM

Let us now implement simple adaptive RAG using Langchain Agent and cohere LLM:

Step1 – Generate Cohere API Key

We need to generate the free API key for using Cohere LLM. Visit website  and log in using Google account or github account. Once logged in you will land at a cohere dashboard page as shown below. Click on API Keys option . You will see a  Trial Free API key is generated. 

 Cohere API Key
 Cohere API Key

Step2 – Generate Tavily Search API Key

Visit the sign in page of site here log in using Google Account or Github Account .

 Sign In Page of Tavily

Once you sign in using any account  you will land at home page of your account which will show  a default free plan with API key is generated similar to the screen below.

 Tavily API Key

Step3 – Install Libraries

Now once the API keys are generated then we need to install the required libraries as below. One can use colab notebooks for development.

! pip install --quiet langchain langchain_cohere tiktoken chromadb pymupdf

Step4 – Set API Keys

Set the API Keys as environment variables:

### Set API Keys
import os

os.environ["COHERE_API_KEY"] = "Cohere API Key"
os.environ["TAVILY_API_KEY"] = "Tavily API Key"

Step5 – Create the Web search Tool

Now we will create the Websearch tool using the object instance of Lang Chain integration of Tavily Search “TavilySearchResults” :

from langchain_community.tools.tavily_search import TavilySearchResults

internet_search = TavilySearchResults()
internet_search.name = "internet_search"
internet_search.description = "Returns a list of relevant document snippets for a textual query retrieved from the internet."

from langchain_core.pydantic_v1 import BaseModel, Field

class TavilySearchInput(BaseModel)
    query: str = Field(description="Query to search the internet with")


internet_search.args_schema = TavilySearchInput

Step6 – Create the RAG Tool

Now we will create the RAG Tool on top of any document. In our case we used an uploaded pdf.

We use the Cohere Embeddings for embedding the Pdf and PyMuPdf to read the pdf text in Documents object. We also use Recursive Text Splitter  to split the documents into chunks.

Then Using Chroma DB we store the document embeddings and index it and persist it in a directory.

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_cohere import CohereEmbeddings
#from langchain_community.document_loaders import WebBaseLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import Chroma

# Set embeddings
embd = CohereEmbeddings()

# Load Docs to Index
loader = PyMuPDFLoader('/content/cleartax-in-s-income-tax-slabs.pdf') #PDF Path
data = loader.load()

#print(data[10])

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=512, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(data)

# Add to vectorstore
vectorstore = Chroma.from_documents(persist_directory='/content/vector',
    documents=doc_splits,
    embedding=embd,
)

vectorstore_retriever = vectorstore.as_retriever()

Step7 – Build Retriever Tool

Now we use the vector retriever created above to build a retriever tool which will be used by the Classifier (ReAct Agent) to direct the appropriate queries to RAG.

from langchain.tools.retriever import create_retriever_tool

vectorstore_search = create_retriever_tool(
    retriever=vectorstore_retriever,
    name="vectorstore_search",
    description="Retrieve relevant info from a vectorstore that contains documents related to Income Tax of India New and Old Regime Rules",
)

Step8 – ReAct Agent Tool

The agent  ReAct is based on the Reasoning + Action framework for LLM which generates response at every step through reasoning at each step and taking appropriate actions based on the reasoning. 

from langchain.agents import AgentExecutor
from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent
from langchain_core.prompts import ChatPromptTemplate

# LLM
from langchain_cohere.chat_models import ChatCohere

chat = ChatCohere(model="command-r-plus", temperature=0.3)

# Preamble
preamble = """
You are an expert who answers the user's question with the most relevant datasource.
You are equipped with an internet search tool and a special vectorstore of information about Income Tax Rules and Regulations of India.
If the query covers the topics of Income tax old and new regime India Rules and regulations then use the vectorstore search.
"""

# Prompt
prompt = ChatPromptTemplate.from_template("{input}")

# Create the ReAct agent
agent = create_cohere_react_agent(
    llm=chat,
    tools=[internet_search, vectorstore_search],
    prompt=prompt,
)

Step9 – Create Agent Executor 

Now we have all the components required so we create an executor wrapper using which we can call the ReAct Agent.  We pass the Agent in agent parameter and also the list of tools in tools parameter.

# Agent Executor

agent_executor = AgentExecutor(
    agent=agent, tools=[internet_search, vectorstore_search], verbose=True
)

Step10 – Testing the Agent Tool 

Now let us test the ReAct Agent by asking different queries.

Asking Query on Current Affairs

output = agent_executor.invoke(
    {
        "input": "What is the general election schedule of India 2024?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

The 2024 Indian general election will be held between April 19 and June 1,across 
seven phases. The counting of votes will take place on June 4, 2024.
 Chain of Thought for Agent Executor while doing internet search

Query related to  Document 

output = agent_executor.invoke(
    {
        "input": "How much deduction is required for a salary of 13lakh so that Old regime is better tahn New regime Threshold?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

The old regime is better for people who have a financial plan for wealth creation by making investments in tax-saving instruments; medical claims and life insurance; making payments of children’s tuition fees; payment of EMIs on education loan; buying a house with a home loan; and so on. The old regime helps with higher tax deductions and lower tax outgo.

The new regime is better for people who make low investments. As the new regime offers six lower-income tax slabs, anyone paying taxes without claiming tax deductions can benefit from paying a lower rate of tax under the new tax regime.

For a salary of 13 lakhs, the old regime will be better if the total deductions are more than 3.75 lakhs.

Directly Answer Queries

Now we will ask a query related to neither internet nor RAG .

output = agent_executor.invoke(
    {
        "input": "What is your name?",
        "preamble": preamble,
    }
)

print(output)

print(output['output'])

Output:

I am an AI assistant trained to answer your queries about the Income Tax Rules 
and Regulations of India. I do not have a name.

Conclusion

Adaptive RAG is a dynamic QA framework that uses a classifier to predict query complexity levels and transitions between iterative and single-step retrieval strategies. It enhances efficiency and accuracy in QA systems. Implemented with Langchain Agent and Cohere LLM, it offers improved decision-making and versatile interaction with external tools. As language models and QA systems evolve, Adaptive RAG is a valuable strategy for managing information retrieval and response selection.

Frequently Asked Questions

Q1. Is there Cohere API free to use ?

A. Yes Cohere currently allows free rate limited API calls for research and prototyping here

Q2.  What are advantages of Tavily Search API ? 

A. It is more optimized for searches with RAG and LLMs as compared to other conventional search APIs.

Q3.  What are the limitations of Adaptive RAG ?

A. Although Adaptive RAG is a novel Question and Answering Strategy but it has its limitations one such being the dependency on a good classifier generally a smaller LLM to help dynamically route queries to appropriate tool. 

Q4. What are the further scopes of improvement in this strategy?

A. We can further enhance this Adaptive RAG strategy by integrating Self – Reflection in RAG which iteratively fetches documents with self reasoning and refines the answer iteratively.

Q5. What are the other LLM models offered by Cohere?

A. Cohere offers many different versions of Models initial versions were – Command, Command R. Command R plus  is the latest model offered by it which is multilingual with larger 128k context window.  Apart from these LLM models it also has embedding model – Embed  and another ranking sorting model Rerank.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ritika Gupta 06 May 2024

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear