Deploying Large Language Models in Production: LLMOps with MLflow

Gayathri Nadella 12 May, 2023 • 5 min read

Introduction

Large Language Models (LLMs) are now widely used in a variety of applications, like machine translation, chat bots, text summarization , sentiment analysis , making advancements in the field of natural language processing (NLP). However, it is difficult to deploy and manage these LLMs in actual use, which is where LLMOps comes in. LLMOps refers to the set of practices, tools, and processes used to develop, deploy, and manage LLMs in production environments.

MLflow is an opensource platform that provides set of tools for tracking experiments, packaging code, and deploying models in production. Centralized model registry of MLflow simplifies the management of model versions and allows for easy sharing and collaborative access with the team members making it a popular choice for data scientists and Machine Learning engineers to streamline their workflow and improve productivity.

 Large Language Models | LLMs | MLflow

Learning Objectives

  • Understand the challenges involved in deploying and managing LLMs in production environments.
  • Learn how MLflow can be used to solve the challenges in deploying the Large language models in production environments there by implementing LLMOps.
  • Explore the support for popular Large Language Model libraries such as – Hugging Face transformers, OpenAI, and Lang Chain.
  • Learn how to use MLflow for LLMOps with practical examples.

This article was published as a part of the Data Science Blogathon.

Challenges in Deploying and Managing LLMs in Production Environments

The following factors make managing and deploying LLMs in a production setting difficult:

  1. Resource Management:  LLMs need a lot of resources, including GPU, RAM, and CPU, to function properly. These resources can be expensive and difficult to manage.
  2. Model Performance: LLMs can be sensitive to changes in the input data, and their performance can vary depending on the data distribution. Ensuring that the good model performance in a production environment can be challenging.
  3. Model Versioning: Updating an LLM can be challenging, especially if you need to manage multiple versions of the model simultaneously. Keeping track of model versions and ensuring that they are deployed correctly can be time-consuming.
  4. Infrastructure: Configuring the infrastructure for deploying LLMs can be challenging, especially if you need to manage multiple models simultaneously.
MLOps | Large Language Models | LLMs | MLflow

How to Use MLflow for LLMOps?

MLflow is an open-source platform for managing the machine learning lifecycle. It provides a set of tools and APIs for managing experiments, packaging code, and deploying models. MLflow can be used to deploy and manage LLMs in production environments by following the steps:

  1. Create an MLflow project: An MLflow project is a packaged version of a machine learning application. You can create an MLflow project by defining the dependencies, code, and config required to run your LLM.
  2. Train and Log your LLM: You can use TensorFlow, PyTorch, or Keras to train your LLM. Once you have trained your model, you can log the model artifacts to MLflow using the MLflow APIs.If you are using a pre trained model you can skip the training step.
  3. Package your LLM: Once you have logged the model artifacts, you can package them using the MLflow commands. The MLflow can create a Python package that includes the model artifacts, dependencies, and config required to run your LLM.
  4. Deploy your LLM: You can deploy your LLM using Kubernetes, Docker, or AWS Lambda. You can use the MLflow APIs to load your LLM and run predictions.

Hugging Face Transformers Support in MLflow

It is a popular open-source library for building natural language processing models. These models are simple to deploy and manage in a production setting due to MLflow’s built-in support for them.To use the Hugging Face transformers with MLflow, follow these steps:

  • Install MLflow and transformers: Transformers and MLflow installation can be done using Pip.
!pip install transformers
!pip install mlflow
  • Define your LLM: The transformers library can be used to define your LLM, as shown in the following Python code:
import transformers
import mlflow

chat_pipeline = transformers.pipeline(model="microsoft/DialoGPT-medium")
  • Log your LLM: To log your LLM to MLflow, use the Python code snippet below:
with mlflow.start_run():
  model_info = mlflow.transformers.log_model(
    transformers_model=chat_pipeline,
    artifact_path="chatbot",
    input_example="Hi there!"
  )
  • Load your LLM and make predictions from it:
# Load as interactive pyfunc
chatbot = mlflow.pyfunc.load_model(model_info.model_uri)
#make predictions
chatbot.predict("What is the best way to get to Antarctica?")
>>> 'I think you can get there by boat'
chatbot.predict("What kind of boat should I use?")
>>> 'A boat that can go to Antarctica.'

Open AI Support in MLflow

Open AI is another popular platform for building LLMs. MLflow provides support for Open AI models, making it easy to deploy and manage Open AI models in a production environment. Following are the steps to use Open AI models with MLflow:

  • Install MLflow and Open AI: Pip can be used to install Open AI and MLflow.
!pip install openai
!pip install mlflow
  • Define your LLM: As shown in the following code snippet, you can define your LLM using the Open AI API:
from typing import List
import openai
import mlflow

# Define a functional model with type annotations

def chat_completion(inputs: List[str]) -> List[str]:
    # Model signature is automatically constructed from
    # type annotations. The signature for this model
    # would look like this:
    # ----------
    # signature:
    #   inputs: [{"type": "string"}]
    #   outputs: [{"type": "string"}]
    # ----------

    outputs = []

    for input in inputs:
        completion = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": "<prompt>"}]
        )

        outputs.append(completion.choices[0].message.content)

    return outputs
  • Log your LLM: You can log your LLM to MLflow using the following code snippet:
# Log the model
mlflow.pyfunc.log_model(
    artifact_path="model",
    python_model=chat_completion,
    pip_requirements=["openai"],
)

Lang Chain Support in MLflow

Lang Chain is a platform for building LLMs using a modular approach. MLflow provides support for Lang Chain models, making it easy to deploy and manage Lang Chain models in a production environment. To use Lang Chain models with MLflow, you can follow these steps:

  • Install MLflow and Lang Chain: You can install MLflow and Lang Chain using pip.
!pip install langchain
!pip install mlflow
  • Define your LLM: The following code snippet demonstrates how to define your LLM using the Lang Chain API:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Translate everything you see after this into French:

{input}"""

prompt = PromptTemplate(template=template, input_variables=["input"])

llm_chain = LLMChain(
    prompt=prompt,
    llm=HuggingFaceHub(
        repo_id="google/flan-t5-small",
        model_kwargs={"temperature":0, "max_length":64}
    ),
)
  • Log your LLM: You can use the following code snippet to log your LLM to MLflow:
mlflow.langchain.log_model(
    lc_model=llm_chain,
    artifact_path="model",
    registered_model_name="english-to-french-chain-gpt-3.5-turbo-1"
)
  • Load the model: You can load your LLM using the below code.
#Load the LangChain model

import mlflow.pyfunc

english_to_french_udf = mlflow.pyfunc.spark_udf(
    spark=spark,
    model_uri="models:/english-to-french-chain-gpt-3.5-turbo-1/1",
    result_type="string"
)
english_df = spark.createDataFrame([("What is MLflow?",)], ["english_text"])

french_translated_df = english_df.withColumn(
    "french_text",
    english_to_french_udf("english_text")
) 

Conclusion

Deploying and managing LLMs in a production environment can be challenging due to resource management, model performance, model versioning, and infrastructure issues. LLMs are simple to deploy and administer in a production setting using MLflow’s tools and APIs for managing the model lifecycle. In this blog, we discussed how to use MLflow to deploy and manage LLMs in a production environment, along with support for Hugging Face transformers, Open AI, and Lang Chain models. The collaboration between data scientists, engineers, and other stakeholders in the machine learning lifecycle can be improved by using MLflow.

MLflow | Hugging Face | OpenAI | LangChain

Some of the Key Takeaways are as follow:

  1. MLflow deploys and manages LLMs in a production environment.
  2. Hugging Face transformers, Open AI, and Lang Chain models support in MLflow.
  3. Resource management, model performance, model versioning, and infrastructure issues can be challenging when deploying and managing LLMs in a production environment, but MLflow provides a set of tools and APIs to help overcome these challenges.
  4. MLflow provides a centralized location for tracking experiments, versioning models, and packaging and deploying models.
  5. MLflow integrates for ease to use with existing workflows.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Gayathri Nadella 12 May 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers