Chain of Verification Implementation Using LangChain Expression Language and LLM 

Ritika Gupta 15 Dec, 2023
9 min read


The constant quest for precision and dependability in the field of Artificial Intelligence (AI) has brought in game-changing innovations. These strategies are critical in leading generative models to offer relevant answers to a range of questions. One of the biggest barriers to the use of Generative AI in different sophisticated applications is hallucination. The recent paper released by Meta AI Research titled “Chain of Verification Reduces Hallucination in Large Language Models”  discusses a simple technique for directly reducing hallucination when generating text.

In this article, we will learn about hallucination problems and explore concepts of CoVe mentioned in the paper and how to implement it using LLMs, LangChain Framework, and LangChain Expression Language (LCEL) to create custom chains.

Learning Objectives

  • Understand the problem of hallucination in LLMs.
  • Learn about the Chain of Verification (CoVe) mechanism to mitigate hallucination.
  • Know about the advantages and disadvantages of CoVe.
  • Learn to implement the CoVe using LangChain and understand LangChain Expression Language.

This article was published as a part of the Data Science Blogathon.

What is the Hallucination Problem in LLMs?

Let us first attempt to learn about the hallucination issue in LLM. Using the autoregressive generation approach, the LLM model predicts the next word given the previous context. For frequent themes, the model has seen enough examples to confidently assign a high probability to correct tokens. However, because the model has not been trained on unusual or unfamiliar topics, it may deliver inaccurate tokens with high confidence. This results in hallucinations of plausible-sounding but wrong information.

Below is one such example of hallucination in Open AI’s ChatGPT where I asked about the book “Economics of Small Things”, published in 2020 by an Indian author, but the model spit the wrong answer with full confidence and confused it with the book of another Nobel prize winner, Abhijit Banerjee, titled “Poor Economics”.

Example of Hallucination in LLMs

Chain of Verification (CoVe) Technique

The CoVe mechanism combines prompting and consistency checks to create a self-verification system for LLMs. Below are the major steps listed in the paper. We will try to understand each step in detail one by one.

Overview of Chain Process

  1. Generate Baseline Response: Given a query, generate the response using the LLM.
  2. Plan Verifications: Given both query and baseline response, generate a list of verification questions that could help to self-analyze if there are any mistakes in the original response.
  3. Execute Verifications: Answer each verification question in turn, and hence check the answer against the original response to check for inconsistencies or mistakes.
  4. Generate Final Verified Response: Given the discovered inconsistencies (if any), generate a revised response incorporating the verification results.
Steps of Chain of Verification to remove hallucination

Understanding Chain Process Using Detailed Example

Chain verification process - example

Generate Initial Response

We first pass our query to LLM without any special prompting to generate an initial response. This serves as the starting point for the CoVe process. Because baseline reactions like these are frequently prone to hallucinations, the CoVe technique seeks to discover and correct these errors in later phases.

Example –  “What is the primary cause of the Mexican-American War?”

Bot Response – The Mexican-American War was an armed conflict between the United States and Mexico from 1846 to 1848. It was followed in the wake of the 1845 U.S. annexation of Texas, which was considered part of its territory in spite of its de facto secession in the 1835 Texas Revolution.

Plan Verifications

Create verification questions based on the query and baseline answer that allow us to examine the baseline response’s factual claims. To implement this we can prompt model a series of verification questions based on both the query and baseline response. Verification questions can be flexible and need not be matched exactly to the original text.

Example – When did Mexican – American war start and end? When did the US annex Texas? When did Texas secede from Mexico?

Execute Verifications

Once we have planned verification questions we can then answer these questions individually. The paper discusses 4 different methods to execute verifications:

1. Joint – In this, the planning and execution of verification questions are done in a single prompt. The questions and their answers are provided in the same LLM prompt. This method is generally not recommended as verification response can be hallucinated.

2. 2-Step – The planning and execution are done separately in two steps with separate LLM prompts. First, we generate verification questions and then we answer those questions.

3. Factored – Here, each verification question is answered independently instead of in same same big response, and the baseline original response is not included. It can help avoid confusion between different verification questions and also can handle more number of questions.

4. Factored + Revise – An additional step is added in this method. After answering every verification question, the CoVe mechanism checks whether the answers match with the original baseline response. This is done in a separate step using an additional prompt.

External Tools or Self LLM: We need a tool that will verify our responses and give verification answers. This can be performed using either the LLM itself or an external tool. If we want greater accuracy then instead of relying on LLM we can use external tools like an internet search engine, any reference document, or any website depending on our use case.

Final Verified Response

In this final step, an improved and verified response is generated. A few-shot prompt is used and all previous context of baseline response and verification question answers are included. If the “Factor+Revise” method was used then the output of cross-checked inconsistency is also provided.

Limitations of CoVe Technique

Although Chain of Verification seems a simple but effective technique still it has some limitations:

  1. Hallucination Not Fully Removed: It does not guarantee the complete removal of hallucinations from the response and hence can produce misleading information.
  2. Compute Intensive: Generating and executing verifications along with response generation can add to computational overhead and cost. Thus, it can slow down the process or increase the computing cost.
  3. Model Specific Limitation: The success of this CoVe method largely depends on the model’s capabilities and its ability to identify and rectify its mistakes.

LangChain Implementation of CoVe

Basic Outline of Algorithm

Here we will use 4 different prompt templates for each of the 4 steps in CoVe and at each step output of the previous step acts as an input for the next step. Also, we follow a factored approach to the execution of verification questions. We use an external internet search tool agent to generate answers for our verification questions.

Flowchart of steps followed in implementation of CoVe using LangChain

Step 1: Install and Load Libraries

!pip install langchain duckduckgo-search

Step 2: Create and Initialize the LLM  Instance

Here am using Google Palm LLM in Langchain since it is freely available. One can generate the API Key for Google Palm using this link and log in using your Google account.

from langchain import PromptTemplate
from langchain.llms import GooglePalm
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda

API_KEY='Generated API KEY'
llm.model_name = 'models/text-bison-001'

Step 3: Generate Initial Baseline Response

We will now create a prompt template to generate the initial baseline response and using this template will create the baseline response LLM chain.

An LLM chain will use the LangChain Expression Language to compose the chain. Here we give the prompt template chained (|) with LLM model (|) and then finally Output parser.

BASELINE_PROMPT = """Answer the below question which is asking for a concise factual answer. NO ADDITIONAL DETAILS.

Question: {query}


# Chain to generate initial response
baseline_response_prompt_template = PromptTemplate.from_template(BASELINE_PROMPT)
baseline_response_chain = baseline_response_prompt_template | llm | StrOutputParser()

Step 4: Generate Question Template for Verification Question

Now we will construct a verification question template which will then help to generate the verification questions in the next step.

VERIFICATION_QUESTION_TEMPLATE = """Your task is to create a verification question based on the below question provided.
Example Question: Who wrote the book 'God of Small Things' ?
Example Verification Question: Was book [God of Small Things] written by [writer]? If not who wrote [God of Small Things] ? 
Explanation: In the above example the verification question focused only on the ANSWER_ENTITY (name of the writer) and QUESTION_ENTITY (book name).
Similarly you need to focus on the ANSWER_ENTITY and QUESTION_ENTITY from the actual question and generate verification question.

Actual Question: {query}

Final Verification Question:"""

# Chain to generate a question template for verification answers
verification_question_template_prompt_template = PromptTemplate.from_template(VERIFICATION_QUESTION_TEMPLATE)
verification_question_template_chain = verification_question_template_prompt_template | llm | StrOutputParser()

Step 5: Generate Verification Question

Now we will generate verification questions using the verification question template defined above:

VERIFICATION_QUESTION_PROMPT= """Your task is to create a series of verification questions based on the below question, the verfication question template and baseline response.
Example Question: Who wrote the book 'God of Small Things' ?
Example Verification Question Template: Was book [God of Small Things] written by [writer]? If not who wrote [God of Small Things]?
Example Baseline Response: Jhumpa Lahiri
Example Verification Question: 1. Was God of Small Things written by Jhumpa Lahiri? If not who wrote God of Small Things ?

Explanation: In the above example the verification questions focused only on the ANSWER_ENTITY (name of the writer) and QUESTION_ENTITY (name of book) based on the template and substitutes entity values from the baseline response.
Similarly you need to focus on the ANSWER_ENTITY and QUESTION_ENTITY from the actual question and substitute the entity values from the baseline response to generate verification questions.

Actual Question: {query}
Baseline Response: {base_response}
Verification Question Template: {verification_question_template}

Final Verification Questions:"""

# Chain to generate the verification questions
verification_question_generation_prompt_template = PromptTemplate.from_template(VERIFICATION_QUESTION_PROMPT)
verification_question_generation_chain = verification_question_generation_prompt_template | llm | StrOutputParser()

Step 6: Execute Verification Question

Here we will use the external search tool agent to execute the verification question. This agent is constructed using LangChain’s Agent and Tools module and DuckDuckGo search module.

Note – There are time restrictions in search agents to use carefully as multiple requests can result in an error due to time restrictions between requests

from langchain.agents import ConversationalChatAgent, AgentExecutor
from import DuckDuckGoSearchResults

#create search agent
search = DuckDuckGoSearchResults()
tools = [search]
custom_system_message = "Assistant assumes no knowledge & relies on internet search to answer user's queries."
max_agent_iterations = 5
max_execution_time = 10

chat_agent = ConversationalChatAgent.from_llm_and_tools(
    llm=llm, tools=tools, system_message=custom_system_message
search_executor = AgentExecutor.from_agent_and_tools(
    max_execution_time = max_execution_time

# chain to execute verification questions
verification_chain = RunnablePassthrough.assign(
    split_questions=lambda x: x['verification_questions'].split("\n"), # each verification question is passed one by one factored approach
) | RunnablePassthrough.assign(
    answers = (lambda x: [{"input": q,"chat_history": []} for q in x['split_questions']])| # search executed for each question independently
) | (lambda x: "\n".join(["Question: {} Answer: {}\n".format(question, answer['output']) for question, answer in zip(x['split_questions'], x['answers'])]))# Create final refined response

Step 7: Generate Final Refined Response

Now we will Generate the final refined answer for which we define the prompt template and llm chain.

FINAL_ANSWER_PROMPT= """Given the below `Original Query` and `Baseline Answer`, analyze the `Verification Questions & Answers` to finally provide the refined answer.
Original Query: {query}
Baseline Answer: {base_response}

Verification Questions & Answer Pairs:

Final Refined Answer:"""

# Chain to generate the final answer
final_answer_prompt_template = PromptTemplate.from_template(FINAL_ANSWER_PROMPT)
final_answer_chain = final_answer_prompt_template | llm | StrOutputParser()

Step 8: Put All the Chains Together

Now we put together all the chains that we defined earlier so that they run in sequence in one go.

chain = RunnablePassthrough.assign(
) |  RunnablePassthrough.assign(
) | RunnablePassthrough.assign(
) | RunnablePassthrough.assign(
) | RunnablePassthrough.assign(

response = chain.invoke({"query": "Who wrote the book 'Economics of Small Things' ?"})
#output of response
{'query': "Who wrote the book 'Economics of Small Things' ?", 'base_response': 'Sanjay Jain', 'verification_question_template': 'Was book [Economics of Small Things] written by [writer]? If not who wrote [Economics of Small Things] ?', 'verification_questions': '1. Was Economics of Small Things written by Sanjay Jain? If not who wrote Economics of Small Things ?', 'verification_answers': 'Question: 1. Was Economics of Small Things written by Sanjay Jain? If not who wrote Economics of Small Things ? Answer: The Economics of Small Things was written by Sudipta Sarangi \n', 'final_answer': 'Sudipta Sarangi'}

Output image:

Output image of CoVe process using LangChain in LLMs


The Chain-of-Verification (CoVe) technique proposed in the study is a strategy aiming to construct big language models, think more critically about their replies, and correct themselves if necessary. This is because this method divides the verification into smaller, more manageable queries. It has also been shown that prohibiting the model from reviewing its prior replies helps to avoid repeating any errors or “hallucinations.” Simply requiring the model to double-check its answers increases its results significantly. Giving CoVe more capabilities, such as allowing it to draw information from external sources, might be one way to increase its effectiveness.

Key Takeaways

  • The Chain process is a useful tool with various combinations of techniques that enable us to verify different parts of our response.
  • Apart from many advantages, there are certain limitations of the Chain process which can be mitigated using different tools and mechanisms.
  • We can leverage the LangChain package to implement this CoVe process.

Frequently Asked Questions

Q1. What are the other techniques for reducing Hallucinations?

A. There are multiple ways to reduce Hallucination at different levels: Prompt Level (Tree of Thought, Chain of Thought), Model Level (DoLa Decoding by Contrasting Layers), and Self-Check (CoVe).

Q2. How can we improve this CoVe process further?

A. We can improve the verification process in CoVe by using support from External Search Tools like Google Search API etc. and for domain and custom use cases we can use retrieval techniques such as RAG.

Q3. Is there any library or framework that supports this verification mechanism?

A. Currently there is no ready-to-use open-source tool implementing this mechanism but we can construct one on our own using the help of Serp API, Google Search, and Lang Chains.

Q4. What is RAG and can it help in Hallucinations?

A. Retrieval Augmented Generation (RAG) technique is used for domain-specific use cases where LLM can produce factually correct responses based on retrieval from this domain-specific data.

Q5. How does the paper implement the CoVe pipeline?

A. The paper used the Llama 65B model as LLM, they then used prompts engineering using few-shot examples to generate questions and give guidance to the model.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ritika Gupta 15 Dec, 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers