Deploying Large Language Models in Production: LLMOps with MLflow

Gayathri Nadella Last Updated : 11 Sep, 2024
5 min read

Introduction

Large Language Models (LLMs) are now widely used in a variety of applications, like machine translation, chat bots, text summarization , sentiment analysis , making advancements in the field of natural language processing (NLP). However, it is difficult to deploy and manage these LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.

MLflow is an opensource platform that provides set of tools for tracking experiments, packaging code, and deploying models in production. Centralized model registry of MLflow simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.

 Large Language Models | LLMs | MLflow

Learning Objectives

  • Understand the challenges involved in deploying and managing LLMs in production environments.
  • Learn how MLflow can be used to solve the challenges in deploying the Large language models in production environments there by implementing LLMOps.
  • Explore the support for popular Large Language Model libraries such as – Hugging Face transformers, OpenAI, and Lang Chain.
  • Learn how to use MLflow for LLMOps with practical examples.

This article was published as a part of the Data Science Blogathon.

Challenges in Deploying and Managing LLMs in Production Environments

The following factors make managing and deploying LLMs in a production setting difficult:

  1. Resource Management:  LLMs need a lot of resources, including GPU, RAM, and CPU, to function properly. These resources can be expensive and difficult to manage.
  2. Model Performance: LLMs can be sensitive to changes in the input data, and their performance can vary depending on the data distribution. Ensuring that the good model performance in a production environment can be challenging.
  3. Model Versioning: Updating an LLM can be challenging, especially if you need to manage multiple versions of the model simultaneously. Keeping track of model versions and ensuring that they are deployed correctly can be time-consuming.
  4. Infrastructure: Configuring the infrastructure for deploying LLMs can be challenging, especially if you need to manage multiple models simultaneously.
MLOps | Large Language Models | LLMs | MLflow

How to Use MLflow for LLMOps?

MLflow is an open-source platform for managing the machine learning lifecycle. It provides a set of tools and APIs for managing experiments, packaging code, and deploying models. MLflow can be used to deploy and manage LLMs in production environments by following the steps:

  1. Create an MLflow project: An MLflow project is a packaged version of a machine learning application. You can create an MLflow project by defining the dependencies, code, and config required to run your LLM.
  2. Train and Log your LLM: You can use TensorFlow, PyTorch, or Keras to train your LLM. Once you have trained your model, you can log the model artifacts to MLflow using the MLflow APIs.If you are using a pre trained model you can skip the training step.
  3. Package your LLM: Once you have logged the model artifacts, you can package them using the MLflow commands. The MLflow can create a Python package that includes the model artifacts, dependencies, and config required to run your LLM.
  4. Deploy your LLM: You can deploy your LLM using Kubernetes, Docker, or AWS Lambda. You can use the MLflow APIs to load your LLM and run predictions.

Hugging Face Transformers Support in MLflow

It is a popular open-source library for building natural language processing models. These models are simple to deploy and manage in a production setting due to MLflow’s built-in support for them.To use the Hugging Face transformers with MLflow, follow these steps:

  • Install MLflow and transformers: Transformers and MLflow installation can be done using Pip.
!pip install transformers
!pip install mlflow
  • Define your LLM: The transformers library can be used to define your LLM, as shown in the following Python code:
import transformers
import mlflow

chat_pipeline = transformers.pipeline(model="microsoft/DialoGPT-medium")
  • Log your LLM: To log your LLM to MLflow, use the Python code snippet below:
with mlflow.start_run():
  model_info = mlflow.transformers.log_model(
    transformers_model=chat_pipeline,
    artifact_path="chatbot",
    input_example="Hi there!"
  )
  • Load your LLM and make predictions from it:
# Load as interactive pyfunc
chatbot = mlflow.pyfunc.load_model(model_info.model_uri)
#make predictions
chatbot.predict("What is the best way to get to Antarctica?")
>>> 'I think you can get there by boat'
chatbot.predict("What kind of boat should I use?")
>>> 'A boat that can go to Antarctica.'

Open AI Support in MLflow

Open AI is another popular platform for building LLMs. MLflow provides support for Open AI models, making it easy to deploy and manage Open AI models in a production environment. Following are the steps to use Open AI models with MLflow:

  • Install MLflow and Open AI: Pip can be used to install Open AI and MLflow.
!pip install openai
!pip install mlflow
  • Define your LLM: As shown in the following code snippet, you can define your LLM using the Open AI API:
from typing import List
import openai
import mlflow

# Define a functional model with type annotations

def chat_completion(inputs: List[str]) -> List[str]:
    # Model signature is automatically constructed from
    # type annotations. The signature for this model
    # would look like this:
    # ----------
    # signature:
    #   inputs: [{"type": "string"}]
    #   outputs: [{"type": "string"}]
    # ----------

    outputs = []

    for input in inputs:
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "<prompt>"}]
        )

        outputs.append(completion.choices[0].message.content)

    return outputs
  • Log your LLM: You can log your LLM to MLflow using the following code snippet:
# Log the model
mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=chat_completion,
    pip_requirements=["openai"],
)

Lang Chain Support in MLflow

Lang Chain is a platform for building LLMs using a modular approach. MLflow provides support for Lang Chain models, making it easy to deploy and manage Lang Chain models in a production environment. To use Lang Chain models with MLflow, you can follow these steps:

  • Install MLflow and Lang Chain: You can install MLflow and Lang Chain using pip.
!pip install langchain
!pip install mlflow
  • Define your LLM: The following code snippet demonstrates how to define your LLM using the Lang Chain API:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Translate everything you see after this into French:

{input}"""

prompt = PromptTemplate(template=template, input_variables=["input"])

llm_chain = LLMChain(
    prompt=prompt,
    llm=HuggingFaceHub(
        repo_id="google/flan-t5-small",
        model_kwargs={"temperature":0, "max_length":64}
    ),
)
  • Log your LLM: You can use the following code snippet to log your LLM to MLflow:
mlflow.langchain.log_model(
    lc_model=llm_chain,
    artifact_path="model",
    registered_model_name="english-to-french-chain-gpt-3.5-turbo-1"
)
  • Load the model: You can load your LLM using the below code.
#Load the LangChain model

import mlflow.pyfunc

english_to_french_udf = mlflow.pyfunc.spark_udf(
    spark=spark,
    model_uri="models:/english-to-french-chain-gpt-3.5-turbo-1/1",
    result_type="string"
)
english_df = spark.createDataFrame([("What is MLflow?",)], ["english_text"])

french_translated_df = english_df.withColumn(
    "french_text",
    english_to_french_udf("english_text")
) 

Conclusion

Deploying and managing LLMs in a production environment can be challenging due to resource management, model performance, model versioning, and infrastructure issues. LLMs are simple to deploy and administer in a production setting using MLflow’s tools and APIs for managing the model lifecycle. In this blog, we discussed how to use MLflow to deploy and manage LLMs in a production environment, along with support for Hugging Face transformers, Open AI, and Lang Chain models. The collaboration between data scientists, engineers, and other stakeholders in the machine learning lifecycle can be improved by using MLflow.

MLflow | Hugging Face | OpenAI | LangChain

Some of the Key Takeaways are as follow:

  1. MLflow deploys and manages LLMs in a production environment.
  2. Hugging Face transformers, Open AI, and Lang Chain models support in MLflow.
  3. Resource management, model performance, model versioning, and infrastructure issues can be challenging when deploying and managing LLMs in a production environment, but MLflow provides a set of tools and APIs to help overcome these challenges.
  4. MLflow provides a centralized location for tracking experiments, versioning models, and packaging and deploying models.
  5. MLflow integrates for ease to use with existing workflows.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Gayathri is an aspiring AI leader and a highly skilled data scientist with over 11 years of experience in leveraging data to drive business outcomes. She has deep expertise in NLP, Computer vision, Machine learning and AI and a proven track record of delivering insights and recommendations that have helped organizations make informed decisions and deliver real business value. With a strong background in both technical and business domains, she is adept at communicating complex data-driven findings in a clear and concise manner.
As a data scientist manager, innovator, and researcher, she has led cross-functional teams of data scientists and engineers to deliver high-quality data-driven insights and solutions to our clients an excellent communicator and team player, a mentor, and has the ability to translate complex technical concepts into plain language for business stakeholders.
As a technical architect, she has designed and implemented, deployed, and maintained AI solutions to enable organizations to leverage their data effectively.

Her experience has taught her that the most important aspect of data science is not just technical expertise, but the ability to work closely with business stakeholders to understand their needs and deliver solutions that meet their business objectives. She always strives to stay at the forefront of the latest data science and technology advancements, and is always eager to learn and grow as a professional.

During the free time, she enjoys reading about the latest advancements in data science and technology.

Responses From Readers

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details