Imagine if your virtual assistant could understand and anticipate your needs perfectly. This vision is becoming a reality with advancements in large language models (LLMs). However, to tailor these models to specific tasks, fine-tuning is essential. Think of it as sculpting a rough block into a precise masterpiece. MonsterAPI simplifies this process, making fine-tuning and evaluation accessible and efficient. In this guide, we’ll show you how MonsterAPI helps refine and assess LLMs, turning them into powerful tools for your unique needs.
Large language models have seen significant advancements in recent years as the field of natural language processing keeps growing. Many closed-source and open-source models are being published for researchers and developers to advance the AI field. These LLMs are performing exceptionally well on general tasks answering a wide range of queries but to make these models personalized and achieve greater accuracy on specific tasks we need to fine-tune these models.
Fine-tuning transforms pre-trained models into context-specific models by adapting domain-specific training with custom datasets. Fine-tuning requires a dedicated dataset to train LLMs and then deploy them on the server for certain use cases. Along with fine-tuning it is also crucial to evaluate these models to measure their effectiveness and on a variety of domain-related tasks that businesses might intend to do.
MonsterAPI helps developers and businesses in fine-tuning and evaluation using the ‘llm_eval’ engine. MonsterAPI has designed no-code as well as code-based fine-tuning APIs that simplify the entire process. The following are the benefits of Monster API:
Fine-tuning is a technique to train the custom dataset on pre-trained LLM for a specific task. It modifies the parameters of pre-trained LLM to evolve into task-specific LLM by leveraging a vast amount of general knowledge of pre-trained LLM. Fine-tuning is done through the following process:
LLM evaluation means the assessment of fine-tuned models involving the performance and effectiveness of a targeted task that we want to Achieve. The evaluation ensures models meet the desired accuracy, coherency and consistency on the validation dataset.
A wide range of evaluation metrics, such as MMLU and GSM8k, test the performance of language models on validation datasets. Comparing these evaluations against benchmarks reveals areas for further improvement in model performance.
MonsterAPI provides a comprehensive LLM evaluation engine to test and assess the fine-tuned model. Evaluation API can be used as follows:
import requests
url = "https://api.monsterapi.ai/v1/evaluation/llm"
payload = {
"deployment_name": "Model_deployment_name",
"basemodel_path": "mistralai/Mistral-7B-v0.1",
"eval_engine": "lm_eval",
"task": "gsm8k,hellaswag"
}
headers = {
"accept": "application/json",
"content-type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
As seen in the above code snippet developed model name along with the model path, eval_engine, and evaluation metrics loaded into the POST request to fine-tune the model which results in a comprehensive report of model performance and evaluation. Now we will look at the step-by-step guide to fine-tune and evaluate models using MonsterAPI with code examples.
MonsterAPI LLM fine-tuner is 10X faster and more efficient with the lowest cost for fine-tuning models across its alternatives. It supports a wide range of models in text generation, code generation, speech-to-text and text-to-speech translation, and image generation for fine-tuning for specific tasks. In this guide, we will learn about the fine-tuning process for text generation models followed by the evaluation of models using Monster API llm eval engine.
MonsterAPI uses a network of computing resources from NVIDIA A100 GPUs with RAMs ranging from 8GB to 80GB depending upon the size of models and hyperparameters configured. Let’s compare the time taken and cost of fine-tuning models with various platforms to choose the right platform for your product.
Platform/service provider | Model Name | Time taken | Cost of fine-tuning |
MonsterAPI | Falcon-7B | 27min 26s | $5-6 |
MonsterAPI | Llama-7B | 115 mins | $6 |
MosaicML | MPT-7B-Instruct | 2.3 Hours | $37 |
Valohai | Mistral-7B | 3 hours | $1.5 |
Mistral | Mistral-7B | 2-3 hours | $4 |
Before we begin fine-tuning the large language model, we need to install the necessary libraries and set up the Monster API key for launching a fine-tuning job by initialising the MonsterAPI client. Sign up on MonsterAPI to get the FREE API key for your project (SignUp). In the below code snippet, we have set up a project environment for our fine-tuning process.
!pip install monsterapi==1.0.8
import os
from monsterapi import client as mclient
import json
import logging
import requests
import os
import huggingface_hub as hf_hub
from huggingface_hub import HfApi, hf_hub_download, file_exists
# Add monster API key over here
os.environ['MONSTER_API_KEY'] = 'YOUR_MONSTER_API_KEY'
client = mclient(api_key=os.environ.get("MONSTER_API_KEY"))
Once the project environment is set, we set up a launch payload that consists of the base model path, LoRA parameters, data source path, and training details such as epochs, learning rates etc. for our fine-tuning job. Once the fine-tuning launch payload is ready we call the Monster API client to run the process and get the fine-tuned model without hassle. In the below code snippet, we have set up a launch payload for our fine-tuning job.
# prepare a launchpad
launch_payload = {
"pretrainedmodel_config": {
"model_path": "huggyllama/llama-7b",
"use_lora": True,
"lora_r": 8,
"lora_alpha": 16,
"lora_dropout": 0,
"lora_bias": "none",
"use_quantization": False,
"use_gradient_checkpointing": False,
"parallelization": "nmp"
},
"data_config": {
"data_path": "tatsu-lab/alpaca",
"data_subset": "default",
"data_source_type": "hub_link",
"prompt_template": "Here is an example on how to use
tatsu-lab/alpaca dataset
### Input: {instruction} ### Output: {output}",
"cutoff_len": 512,
"prevalidated": False
},
"training_config": {
"early_stopping_patience": 5,
"num_train_epochs": 1,
"gradient_accumulation_steps": 1,
"warmup_steps": 50,
"learning_rate": 0.001,
"lr_scheduler_type": "reduce_lr_on_plateau",
"group_by_length": False
},
"logging_config": { "use_wandb": False }
}
# finetune the service using configured params
ret = client.finetune(service="llm", params=launch_payload)
deployment_id = ret.get("deployment_id")
print(ret)
In the above code, we have the following key configurations for fine-tuning the pre-trained model on a custom dataset.
After the fine-tuning process which can take up to 5-10 minutes, we can confirm the model deployment status and can get model fine-tuning job logs for training process review. Check out our official website for more information on LLM fine-tuning here.
# Get deployment status
status_ret = client.get_deployment_status(deployment_id)
print(status_ret)
# Get deployment logs
logs_ret = client.get_deployment_logs(deployment_id)
print(logs_ret)
Once the context-specific model is trained we evaluate the fine-tuned model using our platform’s llm evaluation API to test the accuracy model. Monster API offers a comprehensive report of model insights based on given evaluation metrics such as MMLU, gsm8k, hellaswag, arc, and truthfulqa alike. In the below code, we assign a payload to the evaluation API that evaluates the deployed model and returns the metrics and report from the result URL.
import requests
base_model = launch_payload['pretrainedmodel_config']['model_path']
lora_model_path = status_ret['info']['model_url']
# evaluation api URL
url = "https://api.monsterapi.ai/v1/evaluation/llm"
payload = {
"eval_engine": "lm_eval",
"basemodel_path": base_model,
"loramodel_path": lora_model_path,
"task": "mmlu"
}
headers = {
"accept": "application/json",
"content-type": "application/json",
"authorization": f"Bearer {os.environ['MONSTER_API_KEY']}"
}
response = requests.post(url, json=payload, headers=headers)
print(response.text)
# Extracting deployment ID from response
response_data = response.json()
serving_params = response_data.get("servingParams", {})
eval_deployment_id = serving_params.get("deployment_id")
# Get deployment logs
logs_ret = client.get_deployment_status(eval_deployment_id)
print(logs_ret)
result_url = logs_ret["info"]["result_url"]
response = requests.get(result_url)
result_json = response.json()
print(result_json)
# Extract required values from the JSON
Evaluation_Metrics = {
"MMLU": result_json["results"]["mmlu"]["acc,none"]
}
print(Evaluation_Metrics)
The above code evaluates the fine-tuned model with the ‘lm_eval’ engine on the MMLU evaluation metric using monster APIs. To learn more about the evaluation of models check out the API page here.
Fine-tuning LLMs significantly enhances their performance for specific tasks, and evaluating these models is crucial to ensure their effectiveness and reliability. Our MonsterAPI platform offers robust tools for fine-tuning and evaluation, streamlining the process and offering precise performance metrics. By leveraging MonsterAPI’s LLM evaluation engine, developers can achieve high-quality, specialized language models with confidence, ensuring they meet the desired standards and perform optimally in real-world applications for their context and domain. Thus, the MonsterAPI platform provides state of the art solution for fine-tuning and evaluation with a comprehensive report to develop custom models with few lines of code.
A. Fine-tuning is a process of adapting pre-trained weights of the models to a customer dataset of domain-specific tasks and queries. Evaluation is process of assessing the accuracy of models against industry benchmarks to ensure high quality model development.
A. MonsterAPI helps with hosted APIs for fine-tuning and evaluation of LLMs with low costs and optimized computing resources.
A. Datasets such a text, codebases, images, and videos are used in fine-tuning models based on selection of base model for fine-tuning process.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,