Comprehensive Guide to Build AI Agents from Scratch

Ketan Kumar 16 Jul, 2024
11 min read

Introduction

This article introduces the ReAct pattern for improved capabilities and demonstrates how to create AI agents from scratch. It covers testing, debugging, and optimizing AI agents in addition to tools, libraries, environment setup, and implementation. This tutorial gives users the skills they need to create effective AI agents, regardless of whether they are developers or enthusiasts.

Learning Objectives

  • Grasp the fundamental concepts of AI agents and their significance in various applications.
  • Learn how to implement the Reason + Act (ReAct) pattern in AI agents to enhance their capabilities.
  • Set up the necessary tools and libraries required to build AI agents from scratch.
  • Develop an AI agent using Python, integrate various actions, and implement a reasoning loop.
  • Effectively test and debug the AI agent to ensure it functions as expected.
  • Improve the robustness and security of the AI agent and add more capabilities.
  • Identify practical applications of AI agents and understand their future prospects.

This article was published as a part of the Data Science Blogathon.

Understanding AI Agents

AI agents are self-governing creatures that employ sensors to keep an eye on their environment, process information, and accomplish predefined goals. They can be anything from basic bots to sophisticated systems that can adjust and learn over time. Typical instances include recommendation engines like Netflix and Amazon’s, chatbots like Siri and Alexa, and self-driving cars from Tesla and Waymo.

Also essential in a number of sectors are these agents: UiPath and Blue Prism are examples of robotic process automation (RPA) programs that automate repetitive processes. DeepMind and IBM Watson Health are examples of healthcare diagnostics systems that help diagnose diseases and recommend treatments. In their domains, AI agents greatly improve productivity, precision, and customisation.

Why AI Agents are Important?

These agents play a critical role in improving our daily lives and accomplishing particular objectives.

AI agents are significant because they can:

  • lowering the amount of human labor required to complete routine operations, resulting in increased production and efficiency.
  • analyzing enormous volumes of data to offer conclusions and suggestions that support decision-making.
  • utilizing chatbots and virtual assistants to provide individualized interactions and assistance.
  • enabling complex applications in industries like as banking, transportation, and healthcare.

In essence, AI agents are pivotal in driving the next wave of technological advancements, making systems smarter and more responsive to user needs.

Applications and Use Cases of AI Agents

AI agents have a wide range of applications across various industries. Here are some notable use cases:

  • Customer Service: AI agents in the form of chatbots and virtual assistants handle customer inquiries, resolve issues, and provide personalized support. They can operate 24/7, offering consistent and efficient service.
  • Finance: Financial forecasting, algorithmic trading, and fraud detection are applications of AI agents. They perform trades based on market trends, examine transaction data, and spot questionable patterns.
  • Healthcare: AI agents assist in diagnosing diseases, recommending treatments, and monitoring patient health. They analyze medical data, provide insights, and support clinical decision-making.
  • Marketing: AI agents personalize marketing campaigns, segment audiences, and optimize ad spend. They analyze customer data, predict behavior, and tailor content to individual preferences.
  • Supply Chain Management: AI systems estimate demand, improve inventory levels, and simplify logistics. They examine information from manufacturers, suppliers, and retailers to guarantee smooth operations.

Brief Introduction of ReAct Pattern

The ReAct pattern operates in a loop of Thought, Action, Pause, Observation, Answer.

This loop allows the AI agent to reason about the input, act on it by leveraging external resources, and then integrate the results back into its reasoning process. By doing so, the AI agent can provide more accurate and contextually relevant responses, significantly expanding its utility.

The ReAct pattern is a potent design pattern that combines reasoning and action-taking skills to improve the capabilities of AI agents. LLMs such as GPT-3 or GPT-4 benefit greatly from this technique because it allows them to interface with other tools and APIs to carry out activities beyond their original programming.

The ReAct pattern operates in a cyclic loop consisting of the following steps:

  • Thought: The AI agent processes the input and reasons about what needs to be done. This involves understanding the question or command and determining the appropriate action to take.
  • Action: Based on the reasoning, the agent performs a predefined action. This could involve searching for information, performing calculations, or interacting with external APIs.
  • Pause: The agent waits for the action to be completed. This is a crucial step where the agent pauses to receive the results of the action performed.
  • Observation: The agent observes the results of the action. It analyzes the output received from the action to understand the information or results obtained.
  • Answer: The agent uses the observed results to generate a response. This response is then provided to the user, completing the loop.

Importance and Benefits of Using ReAct

The ReAct pattern is important for several reasons:

  • Enhanced Capabilities: By integrating external actions, the AI agent can perform tasks that require specific information or computations, thus enhancing its overall capabilities.
  • Improved Accuracy: The pattern allows the AI agent to fetch real-time information and perform accurate calculations, leading to more precise and relevant responses.
  • Flexibility: The ReAct pattern makes AI agents more flexible and adaptable to various tasks. They can interact with different APIs and tools to perform a wide range of actions.
  • Scalability: This pattern allows for the addition of new actions and capabilities over time, making the AI agent scalable and future-proof.
  • Real-World Applications: The ReAct pattern enables AI agents to be deployed in real-world scenarios where they can interact with dynamic environments and provide valuable insights and assistance.

Tools and Libraries Needed

Python is a versatile and powerful programming language that is widely used in AI and machine learning due to its simplicity and extensive library support. For building AI agents, several Python libraries are essential:

  • OpenAI API: This library allows you to interact with OpenAI’s language models, such as GPT-3 and GPT-4. It provides the necessary functions to generate text, answer questions, and perform various language-related tasks.
  • httpx: This is a powerful HTTP client for Python that supports asynchronous requests. It is used to interact with external APIs, fetch data, and perform web searches.
  • re (Regular Expressions): This module provides support for regular expressions in Python. It is used to parse and match patterns in strings, which is useful for processing the AI agent’s responses.

Introduction to OpenAI API and httpx Library

The OpenAI API is a robust platform that provides access to advanced language models developed by OpenAI. These models can understand and generate human-like text, making them ideal for building AI agents. With the OpenAI API, you can:

  • Generate text based on prompts
  • Answer questions
  • Perform language translations
  • Summarize text
  • And much more

The httpx library is an HTTP client for Python that supports both synchronous and asynchronous requests. It is designed to be easy to use while providing powerful features for making web requests. With httpx, you can:

  • Send GET and POST requests
  • Handle JSON responses
  • Manage sessions and cookies
  • Perform asynchronous requests for better performance

Together, the OpenAI API and httpx library provide the foundational tools needed to build and enhance AI agents, enabling them to interact with external resources and perform a wide range of actions.

Setting Up the Environment

Let us now set up the environment by following certain steps:

Step1: Installation of Required Libraries

To get started with building your AI agent, you need to install the necessary libraries. Here are the steps to set up your environment:

  • Install Python: Ensure you have Python installed on your system. You can download it from the official Python website:
  • Set Up a Virtual Environment: It’s good practice to create a virtual environment for your project to manage dependencies. Run the following commands to set up a virtual environment:
python -m venv ai_agent_env
source ai_agent_env/bin/activate  # On Windows, use `ai_agent_env\Scripts\activate`
  • Install OpenAI API and httpx: Use pip to install the required libraries:
pip install openai httpx
  • Install Additional Libraries: You may also need other libraries like re for regular expressions, which is included in the Python Standard Library, so no separate installation is required.

Step2: Setting Up API Keys and Environment Variables

To use the OpenAI API, you need an API key. Follow these steps to set up your API key:

  • Obtain an API Key: Sign up for an account on the OpenAI website and obtain your API key from the API section.
  • Set Up Environment Variables: Store your API key in an environment variable to keep it secure. Add the following line to your .bashrc or .zshrc file (or use the appropriate method for your operating system):
export OPENAI_API_KEY='your_openai_api_key_here'
  • Access the API Key in Your Code: In your Python code, you can access the API key using the os module:
import os
openai.api_key = os.getenv('OPENAI_API_KEY')

With the environment set up, you are now ready to start building your AI agent.

Building the AI Agent

Let us now build the AI agent.

Creating the Basic Structure of the AI Agent

To build the AI agent, we will create a class that handles interactions with the OpenAI API and manages the reasoning and actions. Here’s a basic structure to get started:

import openai
import re
import httpx

class ChatBot:
    def __init__(self, system=""):
        self.system = system
        self.messages = []
        if self.system:
            self.messages.append({"role": "system", "content": system})
    
    def __call__(self, message):
        self.messages.append({"role": "user", "content": message})
        result = self.execute()
        self.messages.append({"role": "assistant", "content": result})
        return result
    
    def execute(self):
        completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=self.messages)
        return completion.choices[0].message.content

This class initializes the AI agent with an optional system message and handles user interactions. The __call__ method takes user messages and generates responses using the OpenAI API.

Implementing the ReAct Pattern

To implement the ReAct pattern, we need to define the loop of Thought, Action, Pause, Observation, and Answer. Here’s how we can incorporate this into our AI agent:

Define the Prompt

prompt = """
You run in a loop of Thought, Action, PAUSE, Observation.
At the end of the loop you output an Answer.
Use Thought to describe your thoughts about the question you have been asked.
Use Action to run one of the actions available to you - then return PAUSE.
Observation will be the result of running those actions.

Your available actions are:
calculate:
e.g. calculate: 4 * 7 / 3
Runs a calculation and returns the number - uses Python so be sure to use floating point
syntax if necessary

wikipedia:
e.g. wikipedia: Django
Returns a summary from searching Wikipedia

simon_blog_search:
e.g. simon_blog_search: Django
Search Simon's blog for that term

Example session:
Question: What is the capital of France?
Thought: I should look up France on Wikipedia
Action: wikipedia: France
PAUSE

You will be called again with this:
Observation: France is a country. The capital is Paris.

You then output:
Answer: The capital of France is Paris
""".strip()

Define the query Function

action_re = re.compile('^Action: (\w+): (.*)

The query function runs the ReAct loop by sending the question to the AI agent, parsing the actions, executing them, and feeding the observations back into the loop.

Implementing Actions

Let us now look into the implementing actions.

The Wikipedia search action allows the AI agent to search for information on Wikipedia. Here’s how to implement it:

def wikipedia(q):
    response = httpx.get("https://en.wikipedia.org/w/api.php", params={
        "action": "query",
        "list": "search",
        "srsearch": q,
        "format": "json"
    })
    return response.json()["query"]["search"][0]["snippet"]

The blog search action allows the AI agent to search for information on a specific blog. Here’s how to implement it:

def simon_blog_search(q):
    response = httpx.get("https://datasette.simonwillison.net/simonwillisonblog.json", params={
        "sql": """
        select
          blog_entry.title || ': ' || substr(html_strip_tags(blog_entry.body), 0, 1000) as text,
          blog_entry.created
        from
          blog_entry join blog_entry_fts on blog_entry.rowid = blog_entry_fts.rowid
        where
          blog_entry_fts match escape_fts(:q)
        order by
          blog_entry_fts.rank
        limit
          1
        """.strip(),
        "_shape": "array",
        "q": q,
    })
    return response.json()[0]["text"]

Action: Calculation

The calculation action allows the AI agent to perform mathematical calculations. Here’s how to implement it:

def calculate(what):
    return eval(what)

Adding Actions to the AI Agent

Next, we need to register these actions in a dictionary so the AI agent can use them:

known_actions = {
    "wikipedia": wikipedia,
    "calculate": calculate,
    "simon_blog_search": simon_blog_search
}

Integrating Actions with the AI Agent

To integrate the actions with the AI agent, we need to ensure that the query function can handle the different actions and feed the observations back into the reasoning loop. Here’s how to complete the integration:

def query(question, max_turns=5):
    i = 0
    bot = ChatBot(prompt)
    next_prompt = question
    while i < max_turns:
        i += 1
        result = bot(next_prompt)
        print(result)
        actions = [action_re.match(a) for a in result.split('\n') if action_re.match(a)]
        if actions:
            action, action_input = actions[0].groups()
            if action not in known_actions:
                raise Exception(f"Unknown action: {action}: {action_input}")
            print(" -- running {} {}".format(action, action_input))
            observation = known_actions[action](action_input)
            print("Observation:", observation)
            next_prompt = f"Observation: {observation}"
        else:
            return result

With this setup, the AI agent can reason about the input, perform actions, observe the results, and generate responses.

Testing and Debugging

Let us now follow the steps for testing and debugging.

Running Sample Queries

To test the AI agent, you can run sample queries and observe the results. Here are a few examples:

print(query("What does England share borders with?"))
print(query("Has Simon been to Madagascar?"))
print(query("Fifteen * twenty five"))
Build AI Agents from Scratch

Debugging Common Issues

While testing, you might encounter some common issues. Here are a few tips to debug them:

  • API Errors: Ensure your API keys are correctly set and have the necessary permissions.
  • Network Issues: Check your internet connection and ensure the endpoints you are calling are reachable.
  • Incorrect Outputs: Verify the logic in your action functions and ensure they return the correct results.
  • Unhandled Actions: Make sure all possible actions are defined in the known_actions dictionary.

Improving the AI Agent

Let us now improve AI agents.

Enhancing Robustness and Security

To make the AI agent more robust and secure:

  • Validate Inputs: Ensure all inputs are properly validated to prevent injection attacks, especially in the calculate function.
  • Error Handling: Implement error handling in the action functions to gracefully manage exceptions.
  • Logging: Add logging to track the agent’s actions and observations for easier debugging.

Adding More Actions and Capabilities

To enhance the AI agent’s capabilities, you can add more actions such as:

  • Weather Information: Integrate with a weather API to fetch real-time weather data.
  • News Search: Implement a news search action to fetch the latest news articles.
  • Translation: Add a translation action using a translation API to support multilingual queries.

Real-World Applications

  • Customer Support: AI agents can handle customer inquiries, resolve issues, and provide personalized recommendations.
  • Healthcare: AI agents assist in diagnosing diseases, recommending treatments, and monitoring patient health.
  • Finance: AI agents detect fraud, execute trades, and provide financial advice.
  • Marketing: AI agents personalize marketing campaigns, segment audiences, and optimize ad spend.

Future Prospects and Advancements

The future of AI agents is promising, with advancements in machine learning, natural language processing, and AI ethics. Emerging trends include:

  • Autonomous Systems: More sophisticated autonomous systems capable of handling complex tasks.
  • Human-AI Collaboration: Enhanced collaboration between humans and AI agents for improved decision-making.
  • Ethical AI: Focus on developing ethical AI agents that prioritize privacy, fairness, and transparency.

Conclusion

In this comprehensive guide, we explored the concept of AI agents, their significance, and the ReAct pattern that enhances their capabilities. We covered the necessary tools and libraries, set up the environment, and walked through building an AI agent from scratch. We also discussed implementing actions, integrating them with the AI agent, and testing and debugging the system. Finally, we looked at real-world applications and future prospects of AI agents.

By following this guide, you now have the knowledge to create your own build AI agents from scratch. Experiment with different actions, enhance the agent’s capabilities, and explore new possibilities in the exciting field of artificial intelligence.

Key Takeaways

  • Understanding the core concepts and significance of AI agents.
  • Implementation of the ReAct pattern to allow AI agents to perform actions and reason about their observations.
  • Knowledge of the essential tools and libraries like OpenAI API, httpx, and Python regular expressions.
  • A detailed guide on building an AI agent from scratch, including defining actions and integrating them.
  • Techniques for effectively testing and debugging AI agents.
  • Strategies to enhance the AI agent’s capabilities and ensure its robustness and security.
  • Practical examples of how AI agents are used in various industries and their future advancements.

Frequently Asked Questions

Q1. What is the ReAct pattern in AI?

A. The ReAct pattern (Reason + Act) involves implementing additional actions that an AI agent can take, like searching Wikipedia or running calculations, and teaching the agent to request these actions and process their results.

Q2. What tools and libraries are required to build an AI agent from scratch?

A. Essential tools and libraries include Python, OpenAI API, httpx for HTTP requests, and Python’s regular expressions (re) library.

Q3. How can I ensure the security of my AI agent, especially when using actions like eval?

A. Validate inputs thoroughly to prevent injection attacks, use sandboxing techniques where possible, implement error handling, and log actions for monitoring and debugging.

Q4. Can I add more actions to my AI agent beyond those described in the guide?

A. Yes, you can add various actions such as fetching weather information, searching for news articles, or translating text using appropriate APIs and integrating them into the AI agent’s reasoning loop

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Ketan Kumar 16 Jul, 2024

Hey everyone, Ketan Kumar here! I'm an M.Sc. student at VIT AP with a burning passion for Generative AI. My expertise lies in crafting machine learning models and wielding Natural Language Processing for innovative projects. Currently, I'm putting this knowledge to work in drug discovery research at Syngene International, exploring the potential of LLMs. Always eager to connect and delve deeper into the ever-evolving world of data science!

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,