Optimizing LLMs with Mistral AI’s New Fine-Tuning APIs

Santhosh Reddy Dandavolu 11 Jun, 2024 • 7 min read

Introduction

Fine-tuning enables large language models to better align with specific tasks, teach new facts, and incorporate new information. Fine-tuning significantly improves performance compared to prompting, typically surpassing larger models due to its speed and cost-effectiveness. It offers superior task alignment because it undergoes specific training for these tasks. Additionally, fine-tuning enables the model to be taught using advanced tools or complicated workflows. This article will explore how to fine-tune a large language model using the Mistral AI platform.

Learning Objectives

  • Understand the process and benefits of fine-tuning large language models for specific tasks and advanced workflows.
  • Master the preparation of datasets in JSON Lines format for fine-tuning, including instruction-based and function-calling logic formats.
  • Learn to execute fine-tuning on the Mistral AI platform, configure jobs, monitor training, and perform inference using fine-tuned models.
Optimizing LLMs with Mistral AI's New Fine-Tuning APIs

Dataset Preparation

For dataset preparation, data must be stored in JSON Lines (.jsonl) files, which allow multiple JSON objects to be stored, each on a new line. Datasets should follow an instruction-following format that represents a user-assistant conversation. Each JSON data sample should either consist of only user and assistant messages (“Default Instruct”) or include function-calling logic (“Function-calling Instruct”).

Let us look at a few use cases for constructing a dataset.

Specific Format

Let’s say we want to extract medical information from notes. We can use the medical_knowledge_from_extracts dataset to get the desired output format, which is a JSON object with the following:

Conditions, and Interventions

Interventions can be categorized into behavioral, drug, and other interventions.

Here’s an example of output:

{
  "conditions": "Proteinuria",
  "interventions": [
    "Drug: Losartan Potassium",
    "Other: Comparator: Placebo (Losartan)",
    "Drug: Comparator: amlodipine besylate",
    "Other: Comparator: Placebo (amlodipine besylate)",
    "Other: Placebo (Losartan)",
    "Drug: Enalapril Maleate"
  ]
}

The following code demonstrates how to load this data, format it accordingly, and save it as a .jsonl file. Additionally, you can randomize the order and split the data into training and validation files for further processing.

import pandas as pd
import json

df = pd.read_csv(
    "https://huggingface.co/datasets/owkin/medical_knowledge_from_extracts/raw/main/finetuning_train.csv"
)

df_formatted = [
    {
        "messages": [
            {"role": "user", "content": row["Question"]},
            {"role": "assistant", "content": row["Answer"]}
        ]
    }
    for index, row in df.iterrows()
]

with open("data.jsonl", "w") as f:
    for line in df_formatted:
        json.dump(line, f)
        f.write("\n")

Also Read: Fine-Tuning Large Language Language Models

Coding

To generate SQL from the text, we can use the data containing SQL questions and the context of the SQL table to train the model to output the correct SQL syntax.

The formatted output will be like this:

Optimizing LLMs with Mistral AI's New Fine-Tuning APIs

The code below shows how to format the data for text-to-SQL generation:

import pandas as pd
import json

df = pd.read_json(
    "https://huggingface.co/datasets/b-mc2/sql-create-context/resolve/main/sql_create_context_v4.json"
)

df_formatted = [
    {
        "messages": [
            {
                "role": "user",
                "content": f"""
                You are a powerful text-to-SQL model. Your job is to answer questions about a database. 
                You are given a question and context regarding one or more tables. 
                You must output the SQL query that answers the question.
                
                ### Input: {row["question"]}
                
                ### Context: {row["context"]}
                
                ### Response: 
                """
            },
            {
                "role": "assistant",
                "content": row["answer"]
            }
        ]
    }
    for index, row in df.iterrows()
]

with open("data.jsonl", "w") as f:
    for line in df_formatted:
        json.dump(line, f)
        f.write("\n")

Adopt for RAG

We can also fine-tune an LLM to improve its performance for RAG. RAG introduced Retrieval Augmented Fine-Tuning (RAFT). This method fine-tunes an LLM to answer questions based on relevant documents and ignore irrelevant documents, resulting in substantial improvement in RAG performance across all specialized domains.

To create a fine-tuning dataset for RAG, begin with the context, which is the document’s original text of interest. Using this context, generate questions and answers to form query-context-answer triplets. Below are two prompt templates for generating these questions and answers:

You can use the prompt template below to generate questions based on the context:

Context information is below.
---------------------
{context_str}
---------------------

Given the context information and not prior knowledge, generate {num_questions_per_chunk} questions based on the context. The questions should be diverse in nature across the document. Restrict the questions to the context of the information provided.

Prompt template to generate answers based on the context and the question from the previous prompt template:

Context information is below
--------------------- {context_str} --------------------- 
Given the context information andnot prior knowledge, answer the query. Query: {generated_query_str}

Function Calling

Mistral AI’s function-calling capabilities are enhanced through fine-tuning function-calling data. However, in some cases, the native function calling features may not be sufficient, especially when working with specific tools and domains. In these instances, it is essential to fine-tune using your agent data for function calling. This approach can significantly improve the agent’s performance and accuracy, enabling it to select the appropriate tools and actions effectively.

Here is a simple example to train the model to call the generate_anagram() function as needed:

{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant with access to the following functions to help the user. You can use the functions if needed."
    },
    {
      "role": "user",
      "content": "Can you help me generate an anagram of the word 'listen'?"
    },
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "TX92Jm8Zi",
          "type": "function",
          "function": {
            "name": "generate_anagram",
            "arguments": "{\"word\": \"listen\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "{\"anagram\": \"silent\"}",
      "tool_call_id": "TX92Jm8Zi"
    },
    {
      "role": "assistant",
      "content": "The anagram of the word 'listen' is 'silent'."
    },
    {
      "role": "user",
      "content": "That's amazing! Can you generate an anagram for the word 'race'?"
    },
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "3XhQnxLsT",
          "type": "function",
          "function": {
            "name": "generate_anagram",
            "arguments": "{\"word\": \"race\"}"
          }
        }
      ]
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "generate_anagram",
        "description": "Generate an anagram of a given word",
        "parameters": {
          "type": "object",
          "properties": {
            "word": {
              "type": "string",
              "description": "The word to generate an anagram of"
            }
          },
          "required": ["word"]
        }
      }
    }
  ]
}

Also Read: How Codestral 22B is Leading the Charge in AI Code Generation

How Does the Formatting Work?

  • Store conversational data in a list under the “messages” key.
  • Each message should be a dictionary containing the “role” and “content” or “tool_calls” keys. The “role” must be “user,” “assistant,” “system,” or “tool.”
  • Only “assistant” messages can include the “tool_calls” key, indicating that the assistant uses an available tool.
  • An “assistant” message with a “tool_calls” key cannot have a “content” key and must be followed by a “tool” message, which should then be followed by another “assistant” message.
  • The “tool_call_id” in tool messages must match the “id” of a preceding “assistant” message.
  • “id” and “tool_call_id” should randomly generate strings of exactly 9 characters. It’s recommended that these are generated automatically with “”.join( random.choices(string.ascii_letters + string.digits, k=9)
  • The “tools” key must define all tools used within the conversation.
  • Loss computation is only performed on tokens corresponding to “assistant” messages (where “role” == “assistant”).

You can validate the dataset format and also correct it by modifying the script as needed:

# Download the validation script
wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/validate_data.py

# Download the reformat script
wget https://raw.githubusercontent.com/mistralai/mistral-finetune/main/utils/reformat_data.py

# Reformat data
python reformat_data.py data.jsonl

# Validate data
python validate_data.py data.jsonl

Training

Once you have the data file with the right format, you can upload the data file to the Mistral Client, making them available for use in fine-tuning jobs.

import os
from mistralai.client import MistralClient

api_key = os.environ.get("MISTRAL_API_KEY")
client = MistralClient(api_key=api_key)

with open("training_file.jsonl", "rb") as f:
    training_data = client.files.create(file=("training_file.jsonl", f))

Please note that finetuning happens on the Mistral LLM hosted on the Mistral platform. So, each fine-tuning job costs $2 per 1M tokens for the Mistral 7B model with a minimum of $4.

Once we load the dataset, we can create a fine-tuning job

from mistralai.models.jobs import TrainingParameters

created_jobs = client.jobs.create(
    model="open-mistral-7b",
    training_files=[training_data.id],
    validation_files=[validation_data.id],
    hyperparameters=TrainingParameters(
        training_steps=10,
        learning_rate=0.0001,
    )
)

created_jobs

Expected Output

Optimizing LLMs with Mistral AI's New Fine-Tuning APIs

The parameters are as follows:

  • model: the model you want to fine-tune. You can use open-mistral-7b and mistral-small-latest.
  • training_files: a collection of training file IDs, which can include one or more files
  • validation_files: a collection of validation file IDs, which can include one or more files
  • hyperparameters: two adjustable hyperparameters, “training_step” and “learning_rate”, that users can modify.

For LoRA fine-tuning, the recommended learning rate is 1e-4 (default) or 1e-5.

Here, the learning rate specified is the peak rate rather than a flat rate. The learning rate warms up linearly and decays by cosine schedule. During the warmup phase, the learning rate increases linearly from a small initial value to a larger value over several training steps. Then, the learning rate decreases following a cosine function.

We can also include Weights and Biases to monitor and track the metrics

from mistralai.models.jobs import WandbIntegrationIn, TrainingParameters
import os

wandb_api_key = os.environ.get("WANDB_API_KEY")

created_jobs = client.jobs.create(
    model="open-mistral-7b",
    training_files=[training_data.id],
    validation_files=[validation_data.id],
    hyperparameters=TrainingParameters(
        training_steps=10,
        learning_rate=0.0001,
    ),
    integrations=[
        WandbIntegrationIn(
            project="test_api",
            run_name="test",
            api_key=wandb_api_key,
        ).dict()
    ]
)

created_jobs

You can also use dry_run=True argument to know the number of token the model is being trained on.

Inference

Then, we can list jobs, retrieve a job, or cancel a job.

# List jobs
jobs = client.jobs.list()
print(jobs)

# Retrieve a job
retrieved_jobs = client.jobs.retrieve(created_jobs.id)
print(retrieved_jobs)

# Cancel a job
canceled_jobs = client.jobs.cancel(created_jobs.id)
print(canceled_jobs)

When completing a fine-tuned job, you can get the fine-tuned model name with retrieved_jobs.fine_tuned_model.

from mistralai.models.chat_completion import ChatMessage

chat_response = client.chat(
    model=retrieved_job.fine_tuned_model,
    messages=[
        ChatMessage(role='user', content='What is the best French cheese?')
    ]
)

Local Fine-Tuning and Inference

We can also use open-source libraries from Mistral AI to fine-tune and perform inference on Large Language Models (LLMs) completely locally. Utilize the following repositories for these tasks:

Fine-Tuning: https://github.com/mistralai/mistral-finetune

Inference: https://github.com/mistralai/mistral-inference

Conclusion

In conclusion, fine-tuning large language models on the Mistral platform enhances their performance for specific tasks, integrates new information, and manages complex workflows. You can achieve superior task alignment and efficiency by preparing datasets correctly and using Mistral’s tools. Whether dealing with medical data, generating SQL queries, or improving retrieval-augmented generation systems, fine-tuning is essential for maximizing your models’ potential. The Mistral platform provides the flexibility and power to achieve your AI development goals.

Key Takeaways

  • Fine-tuning large language models significantly improves task alignment, efficiency, and the ability to integrate new and complex information compared to traditional prompting methods.
  • Properly preparing datasets in JSON Lines format and following instruction-based formats, including function-calling logic, is crucial for fine-tuning.
  • The Mistral AI platform offers powerful tools and flexibility for fine-tuning open-source and optimized models, allowing for superior performance in various specialized tasks and applications.
  • Mistral also offers open-source libraries for fine-tuning and inference, which users can utilize locally or on any other platform.

Frequently Asked Questions

Q1. What is the primary benefit of fine-tuning large language models compared to prompting?

A. Fine-tuning large language models significantly improves their alignment with specific tasks, making them better. It also allows the models to incorporate new facts and handle complex workflows more effectively than traditional prompting methods.

Q2. How do we prepare datasets for fine-tuning on the Mistral AI platform?

A. Datasets must be stored in JSON Lines (.jsonl) format, with each line containing a JSON object. The data should follow an instruction-following format that represents user-assistant conversations. The “role” must be “user,” “assistant,” “system,” or “tool.”

Q3. What tools and features does the Mistral AI platform provide for fine-tuning?

A. The Mistral platform offers tools for uploading and preparing datasets, configuring fine-tuning jobs with specific models and hyperparameters, and monitoring training with integrations like Weights and Biases. It also supports performing inference using fine-tuned models, providing a comprehensive environment for AI development.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear