A Step-by-Step Guide to PDF Chatbots with Langchain and Ollama

Subhradeep Rang 17 Nov, 2023 • 10 min read

Introduction

In an era where technology continues to transform the way we interact with information, the concept of a PDF chatbot brings a new level of convenience and efficiency to the table. This article delves into the intriguing realm of creating a PDF chatbot using Langchain and Ollama, where open-source models become accessible with minimal configuration. Say goodbye to the complexities of framework selection and model parameter adjustments, as we embark on a journey to unlock the potential of PDF chatbots. Discover how to seamlessly install Ollama, download models, and craft a PDF chatbot that provides intelligent responses to your queries. Let’s explore this exciting fusion of technology and document processing, making information retrieval easier than ever.

Learning Objectives

  • Understand how to install Ollama on your computer.
  • Learn how to download and run an open-source model using Ollama.
  • Discover the process of creating a PDF chatbot using Langchain and Ollama.

This article was published as a part of the Data Science Blogathon.

Prerequisites

To understand this article properly, you need:

  • Good knowledge of Python and,
  • Basic knowledge of Langchain, i.e. chains, vectorstores etc.

Langchain provides various types of functionalities for creating LLM apps. It deserves a separate article itself. If you don’t know what is Langchain, I suggest you go through some articles or tutorials about Langchain. You can see this video too.

What is Ollama?

Ollama gives you the ability to download the open-source model and use it locally. It automatically downloads the models from the best source. If you have a dedicated GPU on your computer, it will run the model with GPU acceleration. You don’t need to set it manually. You can even customize the model by changing the prompt (yes, you don’t need Langchain for that). Ollama is also available as a docker image so that you can deploy your own model as a docker container. Exciting, right? Now let’s see how you can install Ollama in your machine.

How to Install Ollama?

Unfortunately, Ollama is only available for MacOS and Linux. But don’t worry, there is also a way for Windows users to use Ollama – WSL2. If you don’t have WSL2 on your computer, go through this article. Here I explained everything about WSL2 and how to use it in VS Code. If you have already installed it, open Ubuntu and run the below command in the terminal.

curl https://ollama.ai/install.sh | sh

This will install Ollama in WSL2. If you are using MacOS, visit here. Now you are ready to download a model using Ollama. Keep the terminal open, we are not done yet.

Downloading a Model

Ollama provides various models – llama2, llama2-uncensored, codellama, orca-mini etc. If you’d like to know about all the models available, you can go to this website. Here you will download the orca-mini 3b model. It is a Llama model trained on orca-style datasets created using the approaches defined in the Orca paper. Though this model is small (1.9 GB), it still gives a good response. To download this model, run the below command:

ollama run orca-mini

This command will download and run the orca-mini model in the terminal. Make sure that you have at least 8GB of RAM on your computer before you run this model.

If you are running the model, ask it some questions and see how it’s responding. Here is an example:

downloading a model | Langchain and Ollama

You can see from the above example that it sometimes gives irrelevant information. We can fix this by changing the prompt. We will change the prompt when we create the chatbot.

Now that the model works correctly, let’s create the PDF chatbot using it.

Creating the Chatbot

Set up the Project Directory

If you don’t know how to create a project in WSL2, refer to this article’s last part. There I explained everything. Here we use VS Code as an IDE.

Installing Necessary Libraries

Before we start, let’s install the required libraries. Create a file named requirements.txt and write the dependencies below.

langchain
pymupdf
huggingface-hub
faiss-cpu
sentence-transformers

After that, just open the terminal in VS Code and run the below command.

>>> pip install -r requirements.txt

Now it will take some time. You don’t need any package for Ollama as this is an app. Now we can start coding our chatbot.

Creating Necessary Functions

To create a chatbot in Langchain, we have to follow some steps:

  • Reading the PDF file using any PDF loader from Langchain.
  • If the document is really big, it’s a good idea to break it into smaller parts, also called chunks. This way, we can make sure the model gets the right information for your question without using too many resources. It’s like giving the model just the pieces it needs instead of overwhelming it with everything at once. Every model has a token limit, isn’t it?
  • After that, we have to convert those chunks into vector embeddings so that the model gets a better understanding of the data. Then we have to create the vector store where we can store these embeddings and retrieve them efficiently when needed. Here we will use the FAISS vectorstore.
  • Now we are ready to query the PDF file. For this, we will use RetrievalQA from Langchain. It will take the vector store as a retriever and the model we are going to use. Some other parameters are also used according to our needs.

Import the Necessary Packages

These are the steps for a basic PDF chatbot. Now if you want to create a complex chatbot, some additional steps are also needed. For example, adding memory, including routing, etc. Now let’s create some functions for every step so that we don’t have to repeat the code multiple times for testing. First, we import the necessary packages:

# Importing the necessary packages
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
import textwrap

After that, we create our first function which will load the PDF file. Here you will read the PDF file using PyMuPDFLoader from Langchain.

# This will load the PDF file
def load_pdf_data(file_path):
    # Creating a PyMuPDFLoader object with file_path
    loader = PyMuPDFLoader(file_path=file_path)
    
    # loading the PDF file
    docs = loader.load()
    
    # returning the loaded document
    return docs

Then we have to split the documents into several chunks. Here we will use RecursiveCharacterTextSplitter from Langchain which is the most popular for splitting the texts.

# Responsible for splitting the documents into several chunks
def split_docs(documents, chunk_size=1000, chunk_overlap=20):
    
    # Initializing the RecursiveCharacterTextSplitter with
    # chunk_size and chunk_overlap
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap
    )
    
    # Splitting the documents into chunks
    chunks = text_splitter.split_documents(documents=documents)
    
    # returning the document chunks
    return chunks

Now it’s time for the embedding. For this, we have to first load the embedding model using HuggingFaceEmbedding from Langchain, then we use FAISS to create the vector store. The model used for embedding is all-MiniLM-L6-v2, which you will see later.

# function for loading the embedding model
def load_embedding_model(model_path, normalize_embedding=True):
    return HuggingFaceEmbeddings(
        model_name=model_path,
        model_kwargs={'device':'cpu'}, # here we will run the model with CPU only
        encode_kwargs = {
            'normalize_embeddings': normalize_embedding # keep True to compute cosine similarity
        }
    )


# Function for creating embeddings using FAISS
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"):
    # Creating the embeddings using FAISS
    vectorstore = FAISS.from_documents(chunks, embedding_model)
    
    # Saving the model in current directory
    vectorstore.save_local(storing_path)
    
    # returning the vectorstore
    return vectorstore

Let’s create a custom prompt template so that the chatbot will work as expected. The default prompt for the orca-mini model is given below.

prompt = """
### System:
You are an AI Assistant that follows instructions extreamly well. \
Help as much as you can.

### User:
{prompt}

### Response:

"""

So, keeping the prompt format in that way gives us better responses. Here is the modified prompt template:

template = """
### System:
You are an respectful and honest assistant. You have to answer the user's \
questions using only the context provided to you. If you don't know the answer, \
just say you don't know. Don't try to make up an answer.

### Context:
{context}

### User:
{question}

### Response:
"""

Now we must create the Question Answering chain. Here we use RetrievalQA from Langchain. RetrievalQA doesn’t give chatbot memory i.e. It just answers your questions but doesn’t memorise the previous conversations.

# Creating the chain for Question Answering
def load_qa_chain(retriever, llm, prompt):
    return RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever, # here we are using the vectorstore as a retriever
        chain_type="stuff",
        return_source_documents=True, # including source documents in output
        chain_type_kwargs={'prompt': prompt} # customizing the prompt
    )

We will also create a function that will prettify the response. Here is the code:

# Prettifying the response
def get_response(query, chain):
    # Getting response from chain
    response = chain({'query': query})
    
    # Wrapping the text for better output in Jupyter Notebook
    wrapped_text = textwrap.fill(response['result'], width=100)
    print(wrapped_text)

Now as we created all the necessary functions, it’s time to create the chatbot.

Creating Jupyter Notebook

First, create a Jupyter Notebook in your directory. For this, create a new file with an extension .ipynb. Also don’t forget to install the jupyter from pip. Otherwise, the notebook will not run. Just run the below command to install jupyter.

>>> pip install jupyter

If VS Code doesn’t detect the kernel for Jupyter Notebook, you will see an option Select Kernel on the top right side. You will see some options below.

creating Jupyter Notebook | Langchain and Ollama

Select Python Environments. It will again give you some options.

Python environment | Langchain and Ollama

Select the Starred environment. After that, you are ready to use the Jupyter Notebook.

Importing Necessary Libraries

Now, we are going to import the necessary libraries. Here we will import three libraries:

  • The Python script in which we write all the functions. I gave the file name lang_funcs.py
  • Ollama from langchain.llms and,
  • PromptTemplate from langchain.

Let’s import these libraries:

from lang_funcs import *
from langchain.llms import Ollama
from langchain import PromptTemplate

Loading Models

Now we have to load the orca-mini model and the embedding model named all-MiniLM-L6-v2. This embedding model is small but effective.

# Loading orca-mini from Ollama
llm = Ollama(model="orca-mini", temperature=0)

# Loading the Embedding Model
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")

Ollama models are locally hosted in the port 11434. We don’t have to specify as it is already specified in the Ollama() class of langchain. If the embedding model is not downloaded on your computer, it will do this automatically from Huggingface. Just wait for some time and you are good to go.

Loading the Data and Creating the Vectorstore

It’s time to load the data and create the embedding. Here we will use a PDF book on ML.

# loading and splitting the documents
docs = load_pdf_data(file_path="data/ml_book.pdf")
documents = split_docs(documents=docs)

# creating vectorstore
vectorstore = create_embeddings(documents, embed)

# converting vectorstore to a retriever
retriever = vectorstore.as_retriever()

It will take some time to create the embeddings. So, keep patience. Programming is all about patience, isn’t it?

You are Almost There!

I know you have come a long way. Now we have to create the chain and start testing our chatbot. So let’s do this.

# Creating the prompt from the template which we created before
prompt = PromptTemplate.from_template(template)

# Creating the chain
chain = load_qa_chain(retriever, llm, prompt)

Now we are ready to test our chatbot.

Testing the Chatbot

Now let’s ask some questions to our chatbot. Here are some examples:

>>>get_response("What is random forest?", chain)
>>> Random Forest is an ensemble learning technique that uses decision trees as base classifiers to
create a new classifier. It is a type of ensemble learning method that combines multiple decision
trees to improve the accuracy and reduce overfitting of the final model. The Random Forest algorithm
introduces extra randomness when growing trees, instead of searching for the very best feature when
splitting a node (like regular Decision Trees do). This results in a greater tree diversity, which
trades a higher bias for a lower variance. It is also more computationally efficient than other
ensemble learning methods like Bagging and Boosting.

>>>get_response("What is Voting Classifier?", chain)
>>> A voting classifier is a type of machine learning algorithm that predicts the class of a given
input data point based on the majority vote of multiple classifiers. In other words, it takes the
predictions of several different classifiers and predicts the class that gets the most votes. This
approach can be used to improve the accuracy of machine learning models by using the diversity of
predictions from different classifiers. The hard voting classifier is a type of voting classifier
that predicts the class with the highest majority vote.

>>>get_response("What is the difference between voting classifier and random forest?", chain)
>>> Voting classifiers and Random Forests are two different approaches to creating an ensemble of
classifiers.   Voting classifiers use majority vote to predict the class that receives the most
votes from the classifiers in the ensemble. This approach is based on the idea that the majority
opinion is likely to be correct, and that by aggregating the predictions of multiple classifiers, we
can improve our accuracy. The hard voting classifier is a simple implementation of this approach.
On the other hand, Random Forests is an ensemble learning method that uses decision trees as base
classifiers. It is trained via the bagging method, where each tree is grown from a randomly selected
subset of the training data. The Random Forest classifier is more convenient and optimized for
Decision Trees than voting classifiers, and it generally yields an overall better model.

It really gives good responses. You can ask some more questions to see how it responds.

Conclusion

I hope now you have a clear understanding about how to create a PDF Chatbot using Langchain and Ollama. Ollama is a new kid in this domain and it really makes our life easier. You already see how we initialized the orca-mini model in just one line. Otherwise, you have to use HuggingfacePipeline from Langchain.

Key Takeaways

  • Ollama Simplifies Model Deployment: Ollama simplifies the deployment of open-source models by providing an easy way to download and run them on your local computer.
  • PDF Chatbot Development: Learn the steps involved in creating a PDF chatbot, including loading PDF documents, splitting them into chunks, and creating a chatbot chain.
  • Customization for Better Responses: Understand how to customize prompts and templates to improve the responses of your chatbot.

You will get all the codes used in this article here.

Frequently Asked Questions

Q1. How to know which model I have to use?

A. It totally depends on your use case. But according to me, if you create a chatbot just like we created in this article, try to take a relatively small model. Otherwise, the performance of the chatbot will not be good. Here we use Orca-mini as this is not very large but it gives good responses.

Q2. Which PDFLoader should I use?

A. If you want to read a single-column PDF, then you can use PyPDFLoader. If you are going to read a PDF which is multi-column, you should use PyMuPDFLoader. Now this is my preference. You can try other pdf loaders as well.

Q3. What are the requirements of running WSL2 on my computer?

A. For running WSL2, you need at least 4GB RAM in your computer. If you have Windows 11 installed on your computer, then the process is much smoother.

Q4. Though I am using Ollama, it looks like the chatbot creation process is still tedious. Why is that?

A. Langchain provides us with various customisations. More customisable means you have to write more code. You can use Llamaindex as this gives you the ability to create chatbot much easily.

Q5. Can I create this chatbot using Embedchain?

A. When I am writing this article, Embedchain does not support Ollama. But if you want to create this chatbot without Ollama, you can do that.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Subhradeep Rang 17 Nov 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers