Launching into Autogen: Exploring the Basics of a Multi-Agent Framework

Sunil Kumar Dash 22 Nov, 2023 • 8 min read

Introduction

Embark on a thrilling journey into the future of software development with ‘Launching into Autogen: Exploring the Basics of a Multi-Agent Framework.’ In the wake of OpenAI’s ChatGPT, a specialized realm known as LLM agents is experiencing an unprecedented surge, revolutionizing AI agent development. From automating mundane tasks to tackling challenges in dynamic decision-making, LLM agents are pushing the boundaries of what was once deemed impossible.

As we step into the era of spatial computing, envision a world where computers seamlessly merge with reality, and the significance of AI agents becomes paramount. Imagine instructing agents through words and gestures as they execute tasks with unparalleled reasoning and acting capabilities. However, we’re at the dawn of the AI agent revolution, witnessing the birth of new infrastructures, tools, and frameworks that empower agents to tackle increasingly complex tasks. Autogen, a cutting-edge framework for crafting multi-agent chat systems, takes center stage in our exploration.

Join us in this article as we unravel the intricacies of AI agents in the early stages of the revolution, delving into the capabilities of Autogen and discovering how to bring these intelligent entities to life.

Multi-Agent Framework

Learning Objectives

  • Understand what LLM agents are
  • Learn what Autogen is and explore the basics of building Agents with Autogen
  • Build Agents with Autogen and OpenAI APIs
  • Explore the real-world use cases of LLM Agents

This article was published as a part of the Data Science Blogathon.

What are LLM Agents?

The vanilla Language Models are great at doing many things, such as translation, question-answering, etc. However, their knowledge and capability are limited. It is like a mason without its tools while building a house. However, it has been observed that LLMs can reason and act given the necessary tools. Most LLMs have limited knowledge of the world, but we can augment them with information from custom sources via prompting.

We can achieve this via two methods. Retrieval Augmented Generation and LLM agents. In an RAG, we feed models with information via custom hard-coded pipelines. But with agents, the LLM, based on its reasoning, will use a tool at its disposal. For example, GPT-4 with a Serper tool will browse the internet and answer accordingly, or it can fetch and analyze stock performance when it has access to the Yahoo Finance tool. So, this combination of LLMs, tools, and a framework for reasoning and taking action is what an AI agent is.

There has been a rapid rise in platforms and tools for building LLM agents. Autogen is one such tool. So, let’s understand what Autogen is and how to create LLM agents with it.

What is Autogen?

Autogen is an open-source tool from Microsoft to build robust multi-agent applications. Designed from the ground up, keeping multiple-agent communication in mind. It lets us create LLM applications where multiple agents converse with each other to find solutions to provided problems. The agents are highly customizable, meaning we can guide them to perform specific tasks. It also integrates well with the Langchain tooling ecosystem, which means we leverage existing Langchain tools to augment our agents.

Autogen | Multi-Agent Framework

To accomplish tasks, Autogen provides different types of Agents, such as,

  • Assistant Agent: This agent is responsible for accomplishing tasks such as coding, reviewing, etc.
  • User Proxy Agent: As the name suggests, these agents act on behalf of end-users. This is responsible for bringing humans into the agent loop to guide conversations.
  • Teachable Agent: This agent is configured to be highly teachable. We can feed the agent with explicit information that is absent in LLMs.

We only need an Assistant Agent and User Proxy Agents for most of our use cases. So, let’s see how we can configure agents with Autogen. There are other agents like RetrieveAssistantAgent and RetrieveUserProxy agents configured for RAG.

Here is a diagram of a typical multi-agent workflow.

Multi-Agent workflow

Build Agents with Autogen

Now, let’s dive in to configure Autogen agents. But before that, set up the environment. If the use case requires code execution, the agents will do it in the current environment, and as per the official documentation, it is best done inside a container. To get started quickly, you can use GitHub codespaces. Make sure you install “pyautogen”.

As of now, Autogen supports only OpenAI models. To effectively use agents, we need to configure our models. We can configure multiple OpenAI models and use the ones we need. There are different ways to configure models, but we will define a JSON file.

JSON File

#OAI_CONFIG_LIST
[
    {
        "model": "gpt-4",
        "api_key": "<your OpenAI API key here>"
    },
    {
        "model": "gpt-4",
        "api_key": "<your Azure OpenAI API key here>",
        "base_url": "<your Azure OpenAI API base here>",
        "api_type": "azure",
        "api_version": "2023-07-01-preview"
    },
    {
        "model": "gpt-3.5-turbo",
        "api_key":  "<your OpenAI API key here>"
    }
]

Now, define a config list for models.

import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    filter_dict={
        "model": {
            "gpt-3.5-turbo"
        }
    }
)

This method will first search for OAI_CONFIG_LIST in the environment variable. If unsuccessful, it will search for the OAI_CONFIG_LIST Json file in the current directory. The filter_dict is for filtering models based on some parameters, here it is set to model.

Configuration for LLM

Now, we define the configuration for the LLM. In this example, we will use a Jupyter Notebook function tool to run Python scripts to accomplish a simple graph plotting task.

llm_config = {
    "functions": [
        {
            "name": "python",
            "description": "run cell in ipython and return the execution result.",
            "parameters": {
                "type": "object",
                "properties": {
                    "cell": {
                        "type": "string",
                        "description": "Valid Python cell to execute.",
                    }
                },
                "required": ["cell"],
            },
        },
    ],
    "config_list": config_list,
    "timeout": 120,
}

We shall define the function for running Python scripts in the IPython notebook.

from IPython import get_ipython

def exec_python(cell):
    ipython = get_ipython()
    result = ipython.run_cell(cell)
    log = str(result.result)
    if result.error_before_exec is not None:
        log += f"\n{result.error_before_exec}"
    if result.error_in_exec is not None:
        log += f"\n{result.error_in_exec}"
    return log

Using Assistant Agent and User Proxy Agent

We are almost done building our multi-agent system. The only missing pieces are the agents we talked about earlier. Here, we will need an Assistant Agent and a User proxy agent. This is how we can define these agents.

chatbot = autogen.AssistantAgent(
    name="chatbot",
    system_message="For coding tasks, only use the functions you have been provided with. \
                    Reply TERMINATE when the task is done.",
    llm_config=llm_config,
)

# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    is_termination_msg=lambda x: x.get("content", "") and x.get("content", "").\
                       rstrip().endswith("TERMINATE"),
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "coding"},
)

The UserProxyAgent has a human_input_mode parameter, which puts an actual human in the agent loop based on its value. When set to ALWAYS, it asks for input after every response; for TERMINATE, it only asks at the end of the execution, and for NEVER, it does not ask for user inputs.

Register the functions with the user proxy agent.

user_proxy.register_function(
    function_map={
        "python": exec_python,
    }
)

Now, run the agents.

# start the conversation
user_proxy.initiate_chat(
    chatbot,
    message="plot a curve for sin wave",
)

Running the agent will show logs on your system about what is happening.

Execution Log

"

In the above execution log, you can see that the Assistant agent “chatbot” generates the code, the code is then run, and potential errors are found. The user proxy then sends the error back to the assistant, and again, it runs the solution code, in this case, installing Matplotlib, and finally, it runs the final code and returns the output.

We can also extend this by adding another Assistant agent to the conversation, such as a Critic or Reviewer. This will help make the output more personalized. Here is how you can do that.

critic = autogen.AssistantAgent(
    name="Critic",
    system_message="""Critic. You are a helpful assistant highly \
    skilled in evaluating the quality of a \
    given visualization code by providing a score from 1 (bad) - 10 (good)\
     while providing clear rationale. \
    YOU MUST CONSIDER VISUALIZATION BEST PRACTICES for each evaluation.\
     Specifically, you can carefully \
    evaluate the code across the following dimensions
- bugs (bugs):  are there bugs, logic errors, syntax error or typos?\
 Are there any reasons why the code may \
fail to compile? How should it be fixed? If ANY bug exists,\
 the bug score MUST be less than 5.
- Data transformation (transformation): Is the data transformed\
 appropriately for the visualization type? E.g., \
is the dataset appropriated filtered, aggregated, or grouped\
 if needed? If a date field is used, is the date field\
 first converted to a date object etc?

YOU MUST PROVIDE A SCORE for each of the above dimensions.
{bugs: 0, transformation: 0, compliance: 0, type: 0, encoding: 0, aesthetics: 0}
Do not suggest code. 
Finally, based on the critique above, suggest a concrete list of actions that the coder \
should take to improve the code.
""",
    llm_config=llm_config,
)
groupchat = autogen.GroupChat(agents=[user_proxy, coder, critic], messages=[], max_round=12)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="plot a curve for inverse sin wave in \
                                           the -pi/2 to pi/2 region")

The above run will add a critic to suggest improvements for the plot, and the Assistant will try to generate compliant codes. When we need more than two agents, we use a chat manager.

In this example, we have used GPT 3.5. For more complicated coding and reasoning tasks, GPT-4 is preferred. We can do more with fewer agents with capable models like GPT-4. And also, the GPT-3.5 sometimes tends to get stuck in a loop. So, GPT-4 is a much better choice for severe applications. Also, a new type of experimental EcoAssistant is being developed (code). This agent solves the higher cost of using capable models like GPT-4 via a model hierarchy. The idea is to start a conversation with cost-effective models, and if it does not achieve the end goal, it uses the capable yet costly ones. One significant benefit of this approach is the synergic effect. As the agents share a single database, the codes by bigger models can be retrieved by smaller models later. Thus improving efficiency and reducing costs.

Real-world Use Cases

The scope of AI agents in the real world is immense. A lot of companies have started implementing agents into their existing systems. So here are a few use cases of AI agents that can be very useful.

  • Personal Agents: One of the most crucial uses of an AI agent would be a Jarvis-like personal assistant on electronic devices accomplishing tasks based on text, voice, or gesture commands.
  • AI Instructors: Chat models can only accomplish so much of things. But an AI agent with tools can do much more. AI instructors in various fields, such as Education, Law, and Therapy, can be helpful for a lot of users.
  • Software UX: Software user experience can be enhanced manifold by implementing efficient agents. Instead of manually browsing and clicking buttons to get things done, AI agents will automatically accomplish them based on voice commands, such as ordering food, cab, shopping, etc.
  • Spatial Computing: AI agents will be the herald of spatial computing, where traditional computers seamlessly blend with the real world. The agents can process the data in your surroundings to extract useful information and do complicated things.

Conclusion

The AI agents are growing in popularity. In the times to come, there is zero doubt they will be integrated into most software systems in one way or another. This is still the earliest stage of agent development, similar to the 90s of the internet. In no time, there will be much better agents solving novel problems. And the libraries and tools like Langchain are only going to evolve.

Key Takeaways

  • Autogen is an open-source multi-agent application framework from Microsoft.
  • Autogen lets us seamlessly build a multi-agent system to solve complicated tasks.
  • It provides Assistant Agents and User proxy agents to carry out tasks such as code generation, text generation, etc.
  • While it is OK to use GPT-3.5 for basic things like text generation, more complicated coding tasks require capable models like GPT-4.

Frequently Asked Question

Q1. What is Autogen by Microsoft?

A. Autogen is an open-source Python framework for building a personalized multi-agent system as a high-level abstraction.

Q2. What is Autogen used for?

A. Autogen provides a high-level abstraction for building multi-agent chat solutions for complex LLM workflows.

Q3. What are AI agents?

A. AI agents are software programs that interact with their environment, make decisions, and act to achieve an end goal.

Q4.  What is the best LLM to use with AI agents?

A. This depends on your use cases and budget. GPT 4 is the most capable but expensive, while GPT 3.5 and Cohere models are less qualified but fast and cheap.

Q5. What is the difference between chains and agents?

A. The chains are a sequence of hard-coded actions to follow, while agents use LLMs and other tools (also chains) to reason and act according to the information.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Sunil Kumar Dash 22 Nov 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers