Chain of Verification Implementation Using LangChain Expression Language and LLM

Ritika Last Updated : 15 Dec, 2023

9 min read

Introduction

The constant quest for precision and dependability in the field of Artificial Intelligence (AI) has brought in game-changing innovations. These strategies are critical in leading generative models to offer relevant answers to a range of questions. One of the biggest barriers to the use of Generative AI in different sophisticated applications is hallucination. The recent paper released by Meta AI Research titled “Chain of Verification Reduces Hallucination in Large Language Models” discusses a simple technique for directly reducing hallucination when generating text.

In this article, we will learn about hallucination problems and explore concepts of CoVe mentioned in the paper and how to implement it using LLMs, LangChain Framework, and LangChain Expression Language (LCEL) to create custom chains.

Learning Objectives

Understand the problem of hallucination in LLMs.
Learn about the Chain of Verification (CoVe) mechanism to mitigate hallucination.
Know about the advantages and disadvantages of CoVe.
Learn to implement the CoVe using LangChain and understand LangChain Expression Language.

This article was published as a part of the Data Science Blogathon.

What is the Hallucination Problem in LLMs?
Chain of Verification (CoVe) Technique
- Overview of Chain Process
- Understanding Chain Process Using Detailed Example
Limitations of CoVe Technique
LangChain Implementation of CoVe
Frequently Asked Questions

What is the Hallucination Problem in LLMs?

Let us first attempt to learn about the hallucination issue in LLM. Using the autoregressive generation approach, the LLM model predicts the next word given the previous context. For frequent themes, the model has seen enough examples to confidently assign a high probability to correct tokens. However, because the model has not been trained on unusual or unfamiliar topics, it may deliver inaccurate tokens with high confidence. This results in hallucinations of plausible-sounding but wrong information.

Below is one such example of hallucination in Open AI’s ChatGPT where I asked about the book “Economics of Small Things”, published in 2020 by an Indian author, but the model spit the wrong answer with full confidence and confused it with the book of another Nobel prize winner, Abhijit Banerjee, titled “Poor Economics”.

Chain of Verification (CoVe) Technique

The CoVe mechanism combines prompting and consistency checks to create a self-verification system for LLMs. Below are the major steps listed in the paper. We will try to understand each step in detail one by one.

Overview of Chain Process

Generate Baseline Response: Given a query, generate the response using the LLM.
Plan Verifications: Given both query and baseline response, generate a list of verification questions that could help to self-analyze if there are any mistakes in the original response.
Execute Verifications: Answer each verification question in turn, and hence check the answer against the original response to check for inconsistencies or mistakes.
Generate Final Verified Response: Given the discovered inconsistencies (if any), generate a revised response incorporating the verification results.

Steps of Chain of Verification to remove hallucination

Understanding Chain Process Using Detailed Example

Generate Initial Response

We first pass our query to LLM without any special prompting to generate an initial response. This serves as the starting point for the CoVe process. Because baseline reactions like these are frequently prone to hallucinations, the CoVe technique seeks to discover and correct these errors in later phases.

Example – “What is the primary cause of the Mexican-American War?”

Bot Response – The Mexican-American War was an armed conflict between the United States and Mexico from 1846 to 1848. It was followed in the wake of the 1845 U.S. annexation of Texas, which was considered part of its territory in spite of its de facto secession in the 1835 Texas Revolution.

Plan Verifications

Create verification questions based on the query and baseline answer that allow us to examine the baseline response’s factual claims. To implement this we can prompt model a series of verification questions based on both the query and baseline response. Verification questions can be flexible and need not be matched exactly to the original text.

Example – When did Mexican – American war start and end? When did the US annex Texas? When did Texas secede from Mexico?

Execute Verifications

Once we have planned verification questions we can then answer these questions individually. The paper discusses 4 different methods to execute verifications:

1. Joint – In this, the planning and execution of verification questions are done in a single prompt. The questions and their answers are provided in the same LLM prompt. This method is generally not recommended as verification response can be hallucinated.

2. 2-Step – The planning and execution are done separately in two steps with separate LLM prompts. First, we generate verification questions and then we answer those questions.

3. Factored – Here, each verification question is answered independently instead of in same same big response, and the baseline original response is not included. It can help avoid confusion between different verification questions and also can handle more number of questions.

4. Factored + Revise – An additional step is added in this method. After answering every verification question, the CoVe mechanism checks whether the answers match with the original baseline response. This is done in a separate step using an additional prompt.

External Tools or Self LLM: We need a tool that will verify our responses and give verification answers. This can be performed using either the LLM itself or an external tool. If we want greater accuracy then instead of relying on LLM we can use external tools like an internet search engine, any reference document, or any website depending on our use case.

Final Verified Response

In this final step, an improved and verified response is generated. A few-shot prompt is used and all previous context of baseline response and verification question answers are included. If the “Factor+Revise” method was used then the output of cross-checked inconsistency is also provided.

Limitations of CoVe Technique

Although Chain of Verification seems a simple but effective technique still it has some limitations:

Hallucination Not Fully Removed: It does not guarantee the complete removal of hallucinations from the response and hence can produce misleading information.
Compute Intensive: Generating and executing verifications along with response generation can add to computational overhead and cost. Thus, it can slow down the process or increase the computing cost.
Model Specific Limitation: The success of this CoVe method largely depends on the model’s capabilities and its ability to identify and rectify its mistakes.

LangChain Implementation of CoVe

Basic Outline of Algorithm

Here we will use 4 different prompt templates for each of the 4 steps in CoVe and at each step output of the previous step acts as an input for the next step. Also, we follow a factored approach to the execution of verification questions. We use an external internet search tool agent to generate answers for our verification questions.

Flowchart of steps followed in implementation of CoVe using LangChain

Step 1: Install and Load Libraries

!pip install langchain duckduckgo-search

Step 2: Create and Initialize the LLM Instance

Here am using Google Palm LLM in Langchain since it is freely available. One can generate the API Key for Google Palm using this link and log in using your Google account.

from langchain import PromptTemplate
from langchain.llms import GooglePalm
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda


API_KEY='Generated API KEY'
llm=GooglePalm(google_api_key=API_KEY)
llm.temperature=0.4
llm.model_name = 'models/text-bison-001'
llm.max_output_tokens=2048

Step 3: Generate Initial Baseline Response

We will now create a prompt template to generate the initial baseline response and using this template will create the baseline response LLM chain.

An LLM chain will use the LangChain Expression Language to compose the chain. Here we give the prompt template chained (|) with LLM model (|) and then finally Output parser.

BASELINE_PROMPT = """Answer the below question which is asking for a concise factual answer. NO ADDITIONAL DETAILS.

Question: {query}

Answer:"""


# Chain to generate initial response
baseline_response_prompt_template = PromptTemplate.from_template(BASELINE_PROMPT)
baseline_response_chain = baseline_response_prompt_template | llm | StrOutputParser()

Step 4: Generate Question Template for Verification Question

Now we will construct a verification question template which will then help to generate the verification questions in the next step.

VERIFICATION_QUESTION_TEMPLATE = """Your task is to create a verification question based on the below question provided.
Example Question: Who wrote the book 'God of Small Things' ?
Example Verification Question: Was book [God of Small Things] written by [writer]? If not who wrote [God of Small Things] ? 
Explanation: In the above example the verification question focused only on the ANSWER_ENTITY (name of the writer) and QUESTION_ENTITY (book name).
Similarly you need to focus on the ANSWER_ENTITY and QUESTION_ENTITY from the actual question and generate verification question.

Actual Question: {query}

Final Verification Question:"""


# Chain to generate a question template for verification answers
verification_question_template_prompt_template = PromptTemplate.from_template(VERIFICATION_QUESTION_TEMPLATE)
verification_question_template_chain = verification_question_template_prompt_template | llm | StrOutputParser()

Step 5: Generate Verification Question

Now we will generate verification questions using the verification question template defined above:

VERIFICATION_QUESTION_PROMPT= """Your task is to create a series of verification questions based on the below question, the verfication question template and baseline response.
Example Question: Who wrote the book 'God of Small Things' ?
Example Verification Question Template: Was book [God of Small Things] written by [writer]? If not who wrote [God of Small Things]?
Example Baseline Response: Jhumpa Lahiri
Example Verification Question: 1. Was God of Small Things written by Jhumpa Lahiri? If not who wrote God of Small Things ?


Explanation: In the above example the verification questions focused only on the ANSWER_ENTITY (name of the writer) and QUESTION_ENTITY (name of book) based on the template and substitutes entity values from the baseline response.
Similarly you need to focus on the ANSWER_ENTITY and QUESTION_ENTITY from the actual question and substitute the entity values from the baseline response to generate verification questions.

Actual Question: {query}
Baseline Response: {base_response}
Verification Question Template: {verification_question_template}

Final Verification Questions:"""


# Chain to generate the verification questions
verification_question_generation_prompt_template = PromptTemplate.from_template(VERIFICATION_QUESTION_PROMPT)
verification_question_generation_chain = verification_question_generation_prompt_template | llm | StrOutputParser()

Step 6: Execute Verification Question

Here we will use the external search tool agent to execute the verification question. This agent is constructed using LangChain’s Agent and Tools module and DuckDuckGo search module.

Note – There are time restrictions in search agents to use carefully as multiple requests can result in an error due to time restrictions between requests

from langchain.agents import ConversationalChatAgent, AgentExecutor
from langchain.tools import DuckDuckGoSearchResults

#create search agent
search = DuckDuckGoSearchResults()
tools = [search]
custom_system_message = "Assistant assumes no knowledge & relies on internet search to answer user's queries."
max_agent_iterations = 5
max_execution_time = 10

chat_agent = ConversationalChatAgent.from_llm_and_tools(
    llm=llm, tools=tools, system_message=custom_system_message
)
search_executor = AgentExecutor.from_agent_and_tools(
    agent=chat_agent,
    tools=tools,
    return_intermediate_steps=True,
    handle_parsing_errors=True,
    max_iterations=max_agent_iterations,
    max_execution_time = max_execution_time
)

# chain to execute verification questions
verification_chain = RunnablePassthrough.assign(
    split_questions=lambda x: x['verification_questions'].split("\n"), # each verification question is passed one by one factored approach
) | RunnablePassthrough.assign(
    answers = (lambda x: [{"input": q,"chat_history": []} for q in x['split_questions']])| search_executor.map() # search executed for each question independently
) | (lambda x: "\n".join(["Question: {} Answer: {}\n".format(question, answer['output']) for question, answer in zip(x['split_questions'], x['answers'])]))# Create final refined response

Step 7: Generate Final Refined Response

Now we will Generate the final refined answer for which we define the prompt template and llm chain.

FINAL_ANSWER_PROMPT= """Given the below `Original Query` and `Baseline Answer`, analyze the `Verification Questions & Answers` to finally provide the refined answer.
Original Query: {query}
Baseline Answer: {base_response}

Verification Questions & Answer Pairs:
{verification_answers}

Final Refined Answer:"""


# Chain to generate the final answer
final_answer_prompt_template = PromptTemplate.from_template(FINAL_ANSWER_PROMPT)
final_answer_chain = final_answer_prompt_template | llm | StrOutputParser()

Step 8: Put All the Chains Together

Now we put together all the chains that we defined earlier so that they run in sequence in one go.

chain = RunnablePassthrough.assign(
    base_response=baseline_response_chain
) |  RunnablePassthrough.assign(
    verification_question_template=verification_question_template_chain
) | RunnablePassthrough.assign(
    verification_questions=verification_question_generation_chain
) | RunnablePassthrough.assign(
    verification_answers=verification_chain
) | RunnablePassthrough.assign(
    final_answer=final_answer_chain
)

response = chain.invoke({"query": "Who wrote the book 'Economics of Small Things' ?"})
print(response)

#output of response
{'query': "Who wrote the book 'Economics of Small Things' ?", 'base_response': 'Sanjay Jain', 'verification_question_template': 'Was book [Economics of Small Things] written by [writer]? If not who wrote [Economics of Small Things] ?', 'verification_questions': '1. Was Economics of Small Things written by Sanjay Jain? If not who wrote Economics of Small Things ?', 'verification_answers': 'Question: 1. Was Economics of Small Things written by Sanjay Jain? If not who wrote Economics of Small Things ? Answer: The Economics of Small Things was written by Sudipta Sarangi \n', 'final_answer': 'Sudipta Sarangi'}

Output image:

Conclusion

The Chain-of-Verification (CoVe) technique proposed in the study is a strategy aiming to construct big language models, think more critically about their replies, and correct themselves if necessary. This is because this method divides the verification into smaller, more manageable queries. It has also been shown that prohibiting the model from reviewing its prior replies helps to avoid repeating any errors or “hallucinations.” Simply requiring the model to double-check its answers increases its results significantly. Giving CoVe more capabilities, such as allowing it to draw information from external sources, might be one way to increase its effectiveness.

Key Takeaways

The Chain process is a useful tool with various combinations of techniques that enable us to verify different parts of our response.
Apart from many advantages, there are certain limitations of the Chain process which can be mitigated using different tools and mechanisms.
We can leverage the LangChain package to implement this CoVe process.

Frequently Asked Questions

Q1. What are the other techniques for reducing Hallucinations?

A. There are multiple ways to reduce Hallucination at different levels: Prompt Level (Tree of Thought, Chain of Thought), Model Level (DoLa Decoding by Contrasting Layers), and Self-Check (CoVe).

Q2. How can we improve this CoVe process further?

A. We can improve the verification process in CoVe by using support from External Search Tools like Google Search API etc. and for domain and custom use cases we can use retrieval techniques such as RAG.

Q3. Is there any library or framework that supports this verification mechanism?

A. Currently there is no ready-to-use open-source tool implementing this mechanism but we can construct one on our own using the help of Serp API, Google Search, and Lang Chains.

Q4. What is RAG and can it help in Hallucinations?

A. Retrieval Augmented Generation (RAG) technique is used for domain-specific use cases where LLM can produce factually correct responses based on retrieval from this domain-specific data.

Q5. How does the paper implement the CoVe pipeline?

A. The paper used the Llama 65B model as LLM, they then used prompts engineering using few-shot examples to generate questions and give guidance to the model.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ritika

I am a professional working as data scientist after finishing my MBA in Business Analytics and Finance. A keen learner who loves to explore and understand and simplify stuff! I am currently learning about advanced ML and NLP techniques and reading up on various topics related to it including research papers .

Intermediate Langchain Large Language Models LLMs

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

MUID

Used by Microsoft Clarity, to store and track visits across websites.

Expiry: 1 Year

Type: HTTP

_clck

Used by Microsoft Clarity, Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.

Expiry: 1 Year

Type: HTTP

_clsk

Used by Microsoft Clarity, Connects multiple page views by a user into a single Clarity session recording.

Expiry: 1 Day

Type: HTTP

SRM_I

Collects user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Years

Type: HTTP

SM

Use to measure the use of the website for internal analytics

Expiry: 1 Years

Type: HTTP

CLID

The cookie is set by embedded Microsoft Clarity scripts. The purpose of this cookie is for heatmap and session recording.

Expiry: 1 Year

Type: HTTP

SRM_B

Collected user data is specifically adapted to the user or device. The user can also be followed outside of the loaded website, creating a picture of the visitor's behavior.

Expiry: 2 Months

Type: HTTP

_gid

This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected includes the number of visitors, the source where they have come from, and the pages visited in an anonymous form.

Expiry: 399 Days

Type: HTTP

_ga_#

Used by Google Analytics, to store and count pageviews.

Expiry: 399 Days

Type: HTTP

_gat_#

Used by Google Analytics to collect data on the number of times a user has visited the website as well as dates for the first and most recent visit.

Expiry: 1 Day

Type: HTTP

collect

Used to send data to Google Analytics about the visitor's device and behavior. Tracks the visitor across devices and marketing channels.

Expiry: Session

Type: PIXEL

AEC

cookies ensure that requests within a browsing session are made by the user, and not by other sites.

Expiry: 6 Months

Type: HTTP

G_ENABLED_IDPS

use the cookie when customers want to make a referral from their gmail contacts; it helps auth the gmail account.

Expiry: 2 Years

Type: HTTP

test_cookie

This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor's browser supports cookies.

Expiry: 1 Year

Type: HTTP

_we_us

this is used to send push notification using webengage.

Expiry: 1 Year

Type: HTTP

WebKlipperAuth

used by webenage to track auth of webenagage.

Expiry: Session

Type: HTTP

ln_or

Linkedin sets this cookie to registers statistical data on users' behavior on the website for internal analytics.

Expiry: 1 Day

Type: HTTP

JSESSIONID

Use to maintain an anonymous user session by the server.

Expiry: 1 Year

Type: HTTP

li_rm

Used as part of the LinkedIn Remember Me feature and is set when a user clicks Remember Me on the device to make it easier for him or her to sign in to that device.

Expiry: 1 Year

Type: HTTP

AnalyticsSyncHistory

Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

lms_analytics

Used to store information about the time a sync with the AnalyticsSyncHistory cookie took place for users in the Designated Countries.

Expiry: 6 Months

Type: HTTP

liap

Cookie used for Sign-in with Linkedin and/or to allow for the Linkedin follow feature.

Expiry: 6 Months

Type: HTTP

visit

allow for the Linkedin follow feature.

Expiry: 1 Year

Type: HTTP

li_at

often used to identify you, including your name, interests, and previous activity.

Expiry: 2 Months

Type: HTTP

s_plt

Tracks the time that the previous page took to load

Expiry: Session

Type: HTTP

lang

Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings

Expiry: Session

Type: HTTP

s_tp

Tracks percent of page viewed

Expiry: Session

Type: HTTP

AMCV_14215E3D5995C57C0A495C55%40AdobeOrg

Indicates the start of a session for Adobe Experience Cloud

Expiry: Session

Type: HTTP

s_pltp

Provides page name value (URL) for use by Adobe Analytics

Expiry: Session

Type: HTTP

s_tslv

Used to retain and fetch time since last visit in Adobe Analytics

Expiry: 6 Months

Type: HTTP

li_theme

Remembers a user's display preference/theme setting

Expiry: 6 Months

Type: HTTP

li_theme_set

Remembers which users have updated their display / theme preferences

Expiry: 6 Months

Type: HTTP

Reading list

Introduction to Generative AI

Introduction to Generative AI applications

No-code Generative AI app development

Code-focused Generative AI App Development

Introduction to Responsible AI

LLMS

Prompt Engineering

Finetuning LLMs

Training LLMs from Scratch

Langchain

RAG

LlamaIndex

Stable Diffusion

Chain of Verification Implementation Using LangChain Expression Language and LLM

Introduction

Learning Objectives

Table of Contents

What is the Hallucination Problem in LLMs?

Chain of Verification (CoVe) Technique

Overview of Chain Process

Understanding Chain Process Using Detailed Example

Generate Initial Response

Plan Verifications

Execute Verifications

Final Verified Response

Limitations of CoVe Technique

LangChain Implementation of CoVe

Basic Outline of Algorithm

Step 1: Install and Load Libraries

Step 2: Create and Initialize the LLM Instance

Step 3: Generate Initial Baseline Response

Step 4: Generate Question Template for Verification Question

Step 5: Generate Verification Question

Step 6: Execute Verification Question

Step 7: Generate Final Refined Response

Step 8: Put All the Chains Together

Conclusion

Key Takeaways

Frequently Asked Questions

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Congratulations, You Did It!

Analytics Vidhya (4)

brahmaid

csrftoken

Identityid

sessionid

Google (1)

g_state

Microsoft (7)

MUID

_clck

_clsk

SRM_I

SM

CLID

SRM_B

Google (7)

_gid

_ga_#

_gat_#

collect

AEC

G_ENABLED_IDPS

test_cookie

Webengage (2)

_we_us

WebKlipperAuth

LinkedIn (16)

ln_or

JSESSIONID

li_rm