Are you ready to create your AI team without relying on OpenAI and LM studio? No more breaking the bank or downloading apps. From setting up llama-cpp-python to exploring the power of local LLMs with the help of autogen framework. Get ready to unlock the full potential of Autogen without depending on OpenAI API.
Before we dive into the details, let’s outline the key learning objectives of this article:
This article was published as a part of the Data Science Blogathon.
But what’s so special about this tech toolkit, you ask? Llama-cpp-python is your gateway to running LLMs locally, including big names like LLaMA. It’s like having AI superstars right on your computer, and with support for different BLAS backends, the speed is out of this world!
AutoGen is a unified multi-agent conversation framework that acts as a high-level abstraction for using foundation models. It combines capable, customizable, and conversable agents that integrate LLMs, tools, and human participants via automated chat. It enables agents to communicate and work together autonomously, effectively streamlining complex tasks and automating workflows.
If you’re eager to delve deeper into the capabilities of AutoGen and explore how it facilitates strategic AI team building, look no further than our dedicated blog: “Strategic AI Team Building Made Easy with AutoGen.” This comprehensive resource provides insights, use cases, and a closer look at how AutoGen can transform your approach to AI development.
Some libraries/tools offer a web server that aims to act as a replacement for the OpenAI API.
Library/Tool | Description | Language | Provides Web Server | Acts as OpenAI API Replacement |
---|---|---|---|---|
llama-cpp-python | It is a Python binding for the llama.cpp library provides a high-level Python API for text completion with an OpenAI-like API. | Python/C++ | Yes | Yes |
oobabooga | It is a Gradio web UI for Large Language Models. It supports transformers, GPTQ, AWQ, EXL2, and llama.cpp (GGUF), Llama models | Python | Yes | Yes |
LM studio | LM Studio is a user-friendly interface that allows you to run Large Language Models (LLMs) on your laptop offline. | – | Yes | Yes |
fastchat | It is an open-source library for training, serving, and evaluating LLM chat systems | Python | Yes | Yes |
Other options are available besides those above, but the best depends on your preferences and needs.
Llama-cpp-python is a Python binding for the llama.cpp library. It provides low-level access to the C API via the ctypes interface, high-level Python API for text completion, OpenAI-like API, and LangChain compatibility. It supports multiple BLAS backends for faster processing and also supports hardware acceleration.
llama-cpp-python offers a web server that aims to act as a drop-in replacement for the OpenAI API. This allows you to use llama.cpp compatible models with any OpenAI-compatible client (language libraries, services, etc.)
For this project, we are going to use these tools and libraries:
Virtual environments is a good practice to isolate project-specific dependencies and avoid conflicts with system-wide packages. Here’s how to set up a Python environment:
python -m venv env_name
\env_name\Scripts\activate
source env_name/bin/activate
The following command will deactivate the current venv environment:
deactivate
!git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
pip install --upgrade pip
%cd llama-cpp-python
max_tokens_field = Field(
default=16, ge=1, description="The maximum number of tokens to generate."
)
6. Change the default value from 16 to your desired value (e.g., 4096). For example:
max_tokens_field = Field(
default=4096, ge=1, description="The maximum number of tokens to generate."
)
7. Save the changes to the “app.py” file.
Following these steps, you’ll modify the default value for the “max_tokens_field” in the “app.py” file to 4096 or your specified value.
In Google Colab, you typically work in a cloud-based environment with access to NVIDIA GPUs. Therefore, for optimal performance, consider using a BLAS library that is compatible with GPU acceleration. In the case of Google Colab, cuBLAS is a suitable choice, as it is optimized for NVIDIA GPUs, and Colab instances often come with NVIDIA GPU support.
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python[server]
cd llama-cpp-python
!pip install -e .
!pip install -e .[server]
cd llama_cpp/Introduction
!wget https://huggingface.co/TheBloke/
CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q5_K_M.gguf
!python -m server --model codellama-7b-instruct.Q5_K_M.gguf --host 127.0.0.1 --n_ctx 2048 --n_batch 128 --n_gpu_layers 35 > server.log 2>&1 &
These commands collectively set up the ‘llama-cpp-python’ package, configure it with specific options, install additional dependencies for server functionality, and initiate a server with a pre-trained model. For further reference, refer to official documentation.
This code snippet demonstrates the creation of agents using the `autogen` library for a scenario where a student seeks guidance from a Coding Mentor and an Algorithm Expert in implementing sorting algorithms in Python.
Here’s a breakdown of the code:
1. Import the autogen module:
import autogen
This imports the necessary functionality from the autogen library.
2. Define agent configurations:
config_list = [
{
"model": "codellama-7b-instruct.Q5_K_M.gguf",
"api_base": "http://127.0.0.1:8000/v1",
"api_type": "open_ai",
"api_key": "NULL", # Placeholder
}
]
This sets up a configuration list specifying details like the model to use, API base URL, type of API, and API key (which is a placeholder in this case).
3. Create agents with different roles and configurations:
coding_mentor = autogen.AssistantAgent(
name="CodingMentor",
llm_config={
"seed": 42,
"config_list": config_list,
"temperature": 0.7,
"request_timeout": 1200,
},
system_message="Coding Mentor here! I
can guide you through implementing
sorting algorithms in Python.",
)
This creates an agent named CodingMentor with a specific configuration for the language model (llm_config), including details like seed, model configuration list, temperature, and request timeout.
algorithm_expert = autogen.AssistantAgent(
name="AlgorithmExpert",
llm_config={
"seed": 42,
"config_list": config_list,
"temperature": 0.7,
"request_timeout": 1200,
},
system_message="Algorithm Expert. I
specialize in algorithms. Let's work
on implementing a sorting algorithm
together.",
)
Similarly, this creates another agent named AlgorithmExpert with similar configurations.
student = autogen.UserProxyAgent(
name="Student",
human_input_mode="ALWAYS",
code_execution_config={"work_dir":"node"},
)
This creates a user proxy agent named Student with the human_input_mode set to “ALWAYS,” indicating that the agent will always expect human input.
4. Initiate a chat between the student and the Coding Mentor:
student.initiate_chat(
coding_mentor,
message="""I'm learning about sorting
algorithms in Python and would like
some guidance on implementation. Can
you help me?""",
)
This initiates a chat between the Student and CodingMentor agents with an initial message from the student.
CodingMentor (to chat_manager):
There are many different sorting algorithms available in Python, but I can give you an overview of a few popular ones:
Bubble sort is a simple sorting algorithm that repeatedly iterates through the array and compares adjacent elements. If two elements are in the wrong order, they swap places until no more swaps are needed. The algorithm repeats this process until the entire array is sorted.
Here’s an example of how to implement bubble sort in Python:
def bubble_sort(arr):
n = len(arr)
for i in range(n-1):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
Insertion sort is a simple sorting algorithm that starts at the beginning of the array and compares each element to its predecessor. If the current element is smaller than the predecessor, it moves one position back in the array until it finds a larger element. Then, it inserts the current element into the correct position.
Here’s an example of how to implement insertion sort in Python:
def insertion_sort(arr):
n = len(arr)
for i in range(1, n):
key = arr[i]
j = i-1
while j >= 0 and arr[j] > key:
arr[j+1] = arr[j]
j -= 1
arr[j+1] = key
return arr
Selection sort is a simple sorting algorithm that works by selecting the smallest element from the unsorted part of the array and placing it at the beginning of the sorted part. It then repeats this process for the remaining elements in the unsorted part until the array is sorted.
Here’s an example of how to implement selection sort in Python:
def selection_sort(arr):
n = len(arr)
for i in range(n-1):
min_idx = i
for j in range(i+1, n):
if arr[j] < arr[min_idx]:
min_idx = j
arr[i], arr[min_idx] = arr[min_idx], arr[i]
return arr
Merge sort is a more efficient sorting algorithm that works by dividing the array into two halves, sorting each half recursively, and then merging the two sorted halves. It has an average time complexity of O(n log n).
Here’s an example of how to implement merge sort in Python:
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
while len(left) > 0 and len(right) > 0:
if left[0] <= right[0]:
result.append(left.pop(0))
else:
result.append(right.pop(0))
if len(left) > 0:
result += left
if len(right) > 0:
result += right
return result
Quick sort is a more efficient sorting algorithm that selects a pivot element from the unsorted part of the array, partitions the array around that pivot, and then recursively sort the two partitions. It has an average time complexity of O(n log n).
Here’s an example of how to implement quick sort in Python:
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
less = [x for x in arr[1:] if x <= pivot]
greater = [x for x in arr[1:] if x > pivot]
return quick_sort(less) + [pivot] + quick_sort(greater)
I hope this helps! Let me know if you have any questions or need further clarification on these algorithms.
Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: can you give real-time usecases
Student (to chat_manager):
can you give real-time usecases
CodingMentor (to chat_manager):
Sure, here are some real-world use cases for each sorting algorithm
Bubble sort is often used as a simple example to explain the basic concept of a sorting algorithm. It’s easy to understand and implement but inefficient for large datasets. One real-world use case for bubble sort is in a school project management system where you must sort projects based on their due dates or priority levels.
Insertion sort is a simple sorting algorithm that’s often used as a stepping stone for more advanced algorithms like quicksort and mergesort. It has a time complexity of O(n^2), which makes it less efficient than some other sorting algorithms for large datasets.One real-world use case for insertion sort is in a social media platform where you need to sort users based on their follower counts or engagement levels.
Selection sort is another simple sorting algorithm that’s easy to understand and implement, but it’s not very efficient for large datasets. It has a time complexity of O(n^2), which makes it less efficient than some other sorting algorithms.One real-world use case for selection sort is in a library management system where you need to sort books based on their titles or authors.
Merge sort is a more efficient sorting algorithm with an average time complexity of O(n log n). It’s commonly used in many real-world applications, such as sorting large datasets, searching for specific data, and performing other operations that require efficient sorting. One real-world use case for merge sort is in a search engine where you need to sort web pages based on their relevance or popularity.
Quick sort is another efficient sorting algorithm with an average time complexity of O(n log n). It’s commonly used in many real-world applications, such as sorting large datasets, searching for specific data, and performing other operations that require efficient sorting. One real-world use case for quick sort is in a social media platform where you need to sort users based on their follower counts or engagement levels.
In general, the choice of which sorting algorithm to use depends on the size of the dataset, the type of data being sorted, and the application’s specific requirements.
Thank you for providing some real-world use cases for each sorting algorithm. This helps to illustrate the different applications and scenarios in which each algorithm can be used.
Here are some additional thoughts on each of these algorithms:
Provide feedback to chat_manager. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: exit.
This AutoGen use case demonstrates the automation of generating a stock price chart for specified companies Amazon(AMZN), Apple(AAPL), and Intel Corporation ( INTC) Year-to-Date (YTD). The assistant agent utilizes the yfinance, matplotlib, and pandas packages to fetch stock prices and create a visually appealing chart. The user proxy agent initiates the conversation with a clear task description, and the assistant responds by generating Python code to accomplish the requested task. The resulting chart is saved as “stock_price_ytd.png.” This use case showcases the seamless integration of AutoGen with external libraries for data retrieval, visualization, and code execution in a real-world scenario.
Coding:
from autogen import AssistantAgent, UserProxyAgent,oai
config_list = [
{
"model": "codellama-7b-instruct.Q5_K_M.gguf",
"api_base": "http://127.0.0.1:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
}
]
llm_config = {"config_list": config_list, "seed": 42, "request_timeout": 2028,}
# Create assistant agent
assistant = AssistantAgent(
name="assistant",
llm_config=llm_config,
system_message="""Engineer. You follow an approved plan.
You write python/shell code to solve tasks.
Wrap the code in a code block that specifies the script type.
The user can't modify your code. So do not suggest incomplete
code which requires others to modify. Don't use a code block
if it's not intended to be executed by the executor.
Don't include multiple code blocks in one response. Do not ask
others to copy and paste the result. Check the execution result
returned by the executor. If the result indicates there is
an error, fix the error and output the code again. Suggest
the full code instead of partial code or code changes. If
the error can't be fixed or if the task is not solved even
after the code is executed successfully, analyze the
problem, revisit your assumption, collect additional info
you need, and think of a different approach to try.
""",
)
# Create user proxy agent
user_proxy = UserProxyAgent(
name="user_proxy",
llm_config=llm_config,
code_execution_config={"work_dir": "coding"},)
# Start the conversation
user_proxy.initiate_chat(
assistant,
message="""Plot a chart of AMZN, AAPL and INTC stock
price change YTD.use yfinance,matplotlib,pandas packages
and save to stock_price_ytd.png.""",)
user_proxy (to assistant):
Plot a chart of AMZN, AAPL and INTC stock
price change YTD.use yfinance,matplotlib,pandas packages
and save to stock_price_ytd.png.
——————————————————————————–
assistant (to user_proxy):
To plot the year-to-date (YTD) stock price changes of Amazon (AMZN), Apple (AAPL), and Intel (INTC) using the yfinance, matplotlib, and pandas packages in Python, you can use the following code:
```python
import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
# Get the stock prices for AMZN, AAPL, and INTC
amzn = yf.download('AMZN', start='2018-01-01')['Close']
aapl = yf.download('AAPL', start='2018-01-01')['Close']
intc = yf.download('INTC', start='2018-01-01')['Close']
# Convert the stock prices to a pandas DataFrame
data = pd.DataFrame({'AMZN': amzn, 'AAPL': aapl, 'INTC': intc})
# Plot the YTD price changes using matplotlib
plt.plot(data.index, data['AMZN'], label='AMZN')
plt.plot(data.index, data['AAPL'], label='AAPL')
plt.plot(data.index, data['INTC'], label='INTC')
plt.legend()
plt.xlabel('Date')
plt.ylabel('Price Change (%)')
plt.title('YTD Stock Price Changes')
plt.savefig('stock_price_ytd.png', dpi=300)
```
This code first imports the necessary packages and then downloads the stock prices for AMZN, AAPL, and INTC using the yfinance package. The downloaded data is then converted to a pandas DataFrame and plotted using matplotlib. The YTD price changes are shown as a line graph with the date on the x-axis and the price change in percentage on the y-axis. The legend is displayed to indicate which stock is being represented by each line. Finally, the plot is saved as an image file called `stock_price_ytd.png` using the `savefig()` method of matplotlib.
Note that this code assumes that you have already installed the necessary packages and have a working internet connection. Also, the data used in this example is for demonstration purposes only and may not reflect the actual price changes of these stocks over time.
——————————————————————————–
>>>>>>>> NO HUMAN INPUT RECEIVED.
>>>>>>>> USING AUTO REPLY…
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)…
WARNING:autogen.code_utils:execute_code was called without specifying a value for use_docker. Since the python docker package is not available, code will be run natively. Note: this fallback behavior is subject to change
exitcode: 0 (execution succeeded)
Code output:
——————————————————————————–
It looks like the code has finished executing successfully. The output shows that it has completed all three tasks, with a success message for each task. The output also shows that each task took approximately 100% of the CPU time and memory available to the process. This suggests that the tasks were executed quickly and efficiently.
If you have any concerns about the output or the code, you may want to check the documentation for the specific libraries and tools being used in the code. Additionally, you can try running the code again with different inputs or parameters to see if there are any differences in the output or execution time.
——————————————————————————–
Provide feedback to assistant. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: exit
Let’s display the generated figure.
from IPython.display import Image
Image(filename='coding/stock_price_ytd.png')
Note: After running this code successfully, you can find the “stock_price_ytd.png” file in the same directory where your Python script or Jupyter Notebook is located. This is the default behavior of the savefig function from the matplotlib library. If you want to save the image in a different directory, you can provide the full path in the savefig function.
Previously, the code was executed in a distinct process, preventing the immediate display of the plot within the notebook. To address this, we can modify the code execution environment to use IPython.
Certainly! Below, we illustrate the process of extending the UserProxyAgent to incorporate a different code execution environment, specifically IPython.
1. Import Packages
from typing import Dict, Union
from IPython import get_ipython
import autogen
2. Class Definition:
class IPythonUserProxyAgent(autogen.UserProxyAgent):
This class inherits from autogen.UserProxyAgent, suggesting it is an extension or customization of a base class.
3. generate_init_message:
def generate_init_message(self, *args, **kwargs) -> Union[str, Dict]:
return super().generate_init_message(*args, **kwargs) + """
If you suggest code, the code will be executed in IPython."""
This method generates an initial message to send to the assistant agent. It appends a note indicating that code suggestions will be executed in the IPython environment.
4.run_code:
def run_code(self, code, **kwargs):
result = self._ipython.run_cell("%%capture --no-display cap\n" + code)
log = self._ipython.ev("cap.stdout")
log += self._ipython.ev("cap.stderr")
if result.result is not None:
log += str(result.result)
exitcode = 0 if result.success else 1
if result.error_before_exec is not None:
log += f"\n{result.error_before_exec}"
exitcode = 1
if result.error_in_exec is not None:
log += f"\n{result.error_in_exec}"
exitcode = 1
return exitcode, log, None
This method runs the provided code within the IPython environment. It uses the run_cell method to execute the code and capture the output. The method constructs a log containing standard output, standard error, and additional information about the execution.
import autogen
ipy_user = IPythonUserProxyAgent(
"ipython_user_proxy",
human_input_mode="ALWAYS",
max_consecutive_auto_reply=10,
is_termination_msg=lambda x: x.get(
"content", "").rstrip().endswith(
"TERMINATE") or x.get("content", "").rstrip().endswith('"TERMINATE".'),
)
# the assistant receives a message from the user, which contains the task description
ipy_user.initiate_chat(
assistant,
message="""plot a stacked area chart
visualizing the yearly high and low
prices of Apple (AAPL), Google
(GOOGL), Amazon (AMZN), and Microsoft (MSFT)
use yfinance,matplotlib,pandas packages
""",
)
This snippet creates an instance of IPythonUserProxyAgent and initiates a chat with the assistant, providing a message that includes a task description plot of a stacked area chart visualizing the yearly high and low prices of Apple (AAPL), Google (GOOGL), Amazon (AMZN), and Microsoft (MSFT) use yfinance,matplotlib, pandas packages.
plot a stacked area chart visualizing the yearly high and low prices of
Apple (AAPL), Google (GOOGL), Amazon (AMZN), and Microsoft (MSFT)
use yfinance,matplotlib,pandas packages
If you suggest a code, the code will be executed in IPython.
Here is a suggested solution using the `yfinance` and `matplotlib` libraries:
```
import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
# Get historical data for the four stocks
stocks = ['AAPL', 'GOOGL', 'AMZN', 'MSFT']
prices = yf.download(stocks, start='2010-01-01', end='2022-05-31')['Close']
# Convert to a DataFrame and drop missing values
df = pd.DataFrame(prices)
df = df.dropna()
# Plot the stacked area chart
plt.style.use('ggplot')
fig, ax = plt.subplots(figsize=(12, 6))
ax.stackplot(df.index, df['AAPL'], df['GOOGL'], df['AMZN'], df['MSFT'], labels=stocks)
ax.set_xlabel('Date')
ax.set_ylabel('Price ($)')
ax.legend()
plt.show()
```
This code uses the `yfinance` library to download historical price data for the four stocks from January 1, 2010, to May 31, 2022. It then converts the prices into a DataFrame and drops any missing values (e.g., if a stock was not traded on a particular day). Finally, it uses `matplotlib` to create a stacked area chart showing each stock’s yearly high and low prices. The resulting plot includes a legend with the names of the four stocks.
Note that this is just one possible solution to the problem, and there may be other ways to approach the task depending on your specific needs and requirements.
——————————————————————————–
Provide feedback to the assistant. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation:
>>>>>>>> NO HUMAN INPUT RECEIVED.
>>>>>>>> USING AUTO REPLY…
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is Python)…
ipython_user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output:
[*********************100%%**********************] 4 of 4 completed
The successful execution reveals your output, completing four tasks with progress displayed via a progress bar. Task one is finished at 0%, task two is in progress (shown at 50% completion), task three is also underway (displayed at 75% completion), and task four is completed at 100%.
——————————————————————————–
Provide feedback to the assistant. Press enter to skip and use auto-reply, or type ‘exit’ to end the conversation: exit.
In our AI journey, we’ve uncovered the magic of crafting personalized teams with llama-cpp-python and the collaborative prowess of AutoGen. Forget the queues and hefty bills – with llama-cpp-python. Enter AutoGen – your AI companion for coding quests. Whether you’re building algorithm dream teams or automating financial charts, AutoGen is the hero that makes coding cool. Leverage llama-cpp-python and AutoGen for a personalized AI adventure that unfolds into a masterpiece.
A. llama-cpp-python is a Python binding for the llama.cpp library, providing both low-level and high-level access to language model APIs. Its key distinction lies in acting as a local OpenAI API replacement, reducing costs and enhancing privacy.
A. Factors such as hardware compatibility, programming environment, performance requirements, and ease of integration influence the choice of BLAS library.
A. The seed in AutoGen ensures reproducibility in code generation. Setting the seed to a specific numeric value generates consistent results across tasks.
A. Yes, you can extend autoGen’s collaborative coding environment. This blog demonstrates switching to an IPython code execution environment to address display issues and ensure immediate plot visibility within the notebook.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,