Converting natural language queries into code is one of the toughest challenges in NLP. The ability to change a simple English question into a complex code opens up a number of possibilities in developer productivity and a quick software development lifecycle. This is where Google Gemma, an Open Source Large Language Model comes into play. This guide will explore how to fine-tune Google’s Gemma with unsloth for generating code statements from natural language queries.
This article was published as a part of the Data Science Blogathon.
Google developed a set of Open-Source Large Language Models called Google Gemma. It was trained with 6T text tokens based on Google’s Gemini models. These are regarded to be the Gemini models‘ lighter variants. There are two sizes in the Gemma family: a 2 billion parameter model for CPU and on-device applications, and a 7 billion parameter model for effective deployment on GPU and TPU.
Gemma has cutting-edge comprehension and reasoning ability at scale along with high talent in text domains. It outperforms other open models in a variety of categories, which include question answering, commonsense reasoning, mathematics, and science at matchable or bigger scales. Google releases fine-tuned checkpoints and an open-source codebase for inference and serving for both models. In this guide, we will work with the 7 Billion Parameter version of Gemma.
Daniel and Michael Han crafted Unsloth, which quickly emerged as the optimized framework tailored to refine the fine-tuning process for large language models (LLMs). Renowned for its swiftness and memory efficiency, Unsloth can fasten up to 30x training speed and with a notable 60% reduction in memory usage. These impressive metrics have rendered it a goto framework for the developers seeking to fine-tune LLMs with precision and speed.
Notably, Unsloth accommodates different Hardware Setups, spanning from NVIDIA GPUs like Tesla T4 to H100, and extends its compatibility to AMD and Intel GPUs. The library’s adaptability shines through its incorporation of pioneering techniques which include intelligent weight upcasting, a feature that curtails the necessity for upscaling weights during QLoRA, thereby optimizing memory usage. Additionally, Unsloth leverages bfloat16 swiftly, improving the stability of 16-bit training and expediting QLoRA fine-tuning.
As an open-source tool licensed under Apache 2.0, Unsloth integrated seamlessly into fine-tuning prominent LLMs like Mistral 7B, Llama, and Google Gemma showing up to an aspect of 5x acceleration in fine-tuning speed while concurrently slashing memory consumption by 60%. Moreover, its compatibility extends to alternative fine-tuning methods like Flash-Attention 2, which not only accelerates inference but even fine-tuning processes.
The first thing is to prepare the Python environment by downloading and installing the necessary libraries. We will be working on Google Collab to Finetune the Gemma LLM. To do so, we will follow the following commands
!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"
# Import the FastLanguageModel class from the unsloth library.
from unsloth import FastLanguageModel
# Import the torch library.
import torch
# Set the maximum sequence length to 8192 tokens.
max_seq_length = 8192
# Set the data type to None for automatic detection.
dtype = None
# Set the load_in_4bit flag to True to load the model weights in 4-bit precision.
load_in_4bit = True
In this section, we will start off by first downloading the Gemma Model:
# Load the pre-trained model from the 'unsloth/gemma-7b-bnb-4bit' repository.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-7b-bnb-4bit",
# Set the maximum sequence length to the value defined earlier.
max_seq_length = max_seq_length,
# Set the data type to the value defined earlier.
dtype = dtype,
# Set the load_in_4bit flag to the value defined earlier.
load_in_4bit = load_in_4bit,
)
We can see that after running the code, the code will download the gemma-7b 4-bit quantized version from the unsloth huggingface hub present in HuggingFace. Finally, the download of the quantized model step is completed. Now we need to create a LoRA for this so that we can only train a subset of these parameters.
# Create a PEFT model with the given parameters
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRa Rank
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing=True
)
Finally running this code will create the LoRA adapters for the Gemma 7B model which we can work with to fine-tune the model on different types of dataset.
Now, we will download the dataset and prepare it for fine-tuning. For this guide, for generating codes, we will go with the Token Bender Code Instructions dataset. This dataset follows alpaca tyle chat formatting. The dataset looks like the one below:
We work with mainly 3 columns, the Input, Instruction, and the Output Column. With these 3 columns, we arrange it in an alpaca-style format and train the Gemma Large Language Model on this data. First, let’s define a helper function that takes in each row of this data and converts it to an alpaca-style format.
def formatted_train(x):
if x['input']:
formatted_text = f"""Below is an instruction that describes a task. \
Write a response that appropriately completes the request.
### Instruction:
{x['instruction']}
### Input:
{x['input']}
### Response:
{x['output']}<eos>"""
else:
formatted_text = f"""Below is an instruction that describes a task. \
Write a response that appropriately completes the request.
### Instruction:
{x['instruction']}
### Response:
{x['output']}<eos>"""
return formatted_text
The function takes in each row of the dataset and returns it in the corresponding Alpaca format:
Next, we create a function to download the dataset from HuggingFace and transform the dataset with the following formatting.
from datasets import load_dataset, Dataset
def prepare_train_data(data_id):
data = load_dataset(data_id, split="train")
data_df = data.to_pandas()
data_df["formatted_text"] = data_df[["input", "output",
"instruction"]].apply(formatted_train, axis=1)
data = Dataset.from_pandas(data_df)
return data
data_id = "TokenBender/code_instructions_122k_alpaca_style"
data = prepare_train_data(data_id)
And then we finally pass the data_id to the prepare_train_data function. We download the dataset from HuggingFace, apply the specified changes to each row, and then save the resulting Alpaca-format text in the ‘formatted_text’ column of the dataset.
With this, we have completed the preparation of the code dataset for fine-tuning.
We now have access to the dataset for fine-tuning. In this section, we will start off by defining the training arguments and finally fine-tune the model. The below code defines the training arguments for fine-tuning Google Gemma Large Language Model:
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = data,
dataset_text_field = "formatted_text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 10,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "paged_adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
The provided code snippet configures training arguments for the Large Language Model using the TrainingArguments class from the Transformers library. These arguments Define different parameters that control the training process. Then passes them along with other Trainer parameters to the SFTTrainer class.
Here’s a breakdown of the key arguments for the TrainingArguments:
Finally, we are done creating our Training Arguments. We pass these Training Arguments to the SFTTrainer, to the args variable. Apart from the TrainingArguments, we even pass in the below parameters:
We are finally done with defining our Trainer for training our quantized Gemma 7B Large Language Model. Now we will run the trainer to start the training process. To do this, we write the below command:
trainer_stats = trainer.train()
Running the above will start the training process. It can take up to 30 minutes in the Google Colab to train this model. Finally, after 30 minutes, the model will be fine-tuned on the Code Dataset:
Now, we will test the fine-tuned Gemma 7B that is trained on the Code Dataset. Before that, let’s define some helper functions that will let us create the Prompt in Alpaca Format.
def format_test(x):
if x['input']:
formatted_text = f"""Below is an instruction that describes a task. \
Write a response that appropriately completes the request.
### Instruction:
{x['instruction']}
### Input:
{x['input']}
### Response:
"""
else:
formatted_text = f"""Below is an instruction that describes a task. \
Write a response that appropriately completes the request.
### Instruction:
{x['instruction']}
### Response:
"""
return formatted_text
This function format_test() is very much similar to the function that we have defined during our Dataset processing stage. The only difference here is that we only take in the input and instruct., from the data this time and leave the output so that the model will generate it.
Let’s try to visualize an example Prompt with this function:
Prompt = format_test(data[155])
print(Prompt)
Now let’s take in the fine-tuned model, give this input, and see what output it generates.
from transformers import TextStreamer
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
Prompt
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 512)
Running this code will stream the output generated by the Large Language Model. It has given the following result:
We see that the model has generated the following code:
import base64
def encrypt(message, key):
encoded_message = base64.b64encode(message.encode('utf-8'))
return encoded_message.decode('utf-8')
def decrypt(encrypted_message, key):
decoded_message = base64.b64decode(encrypted_message.encode('utf-8'))
return decoded_message.decode('utf-8')
message = "Hello World!"
key = "secret"
encrypted_message = encrypt(message, key)
print(encrypted_message)
decrypted_message = decrypt(encrypted_message, key)
print(decrypted_message)
This code generated by the Gemma 7B LLM works perfectly fine. Let’s try asking another question and see the response generated. Below is another Prompt and its respective answer generated by the fine-tuned Gemma 7B Large Langage Model.
Below is the code generated by the Large Language Model for the provided Prompt:
def remove_duplicates(list):
seen = set()
result = []
for item in list:
if item not in seen:
result.append(item)
seen.add(item)
return result
list = [1, 1, 2, 3, 4, 4, 5]
print(remove_duplicates(list)) # [1, 2, 3, 4, 5]
Even the above code works perfectly fine. We see that fine-tuning Google Gemma 7B Large Language Model in just 60 steps has resulted in a good code-generating model. The LLM is even able to understand the formatting correctly and generate the response in the same Alpaca format.
The integration of Google’s Gemma with Unsloth for code generation from natural language queries has shown potential in enhancing developer productivity. Gemma, a robust Large Language Model, can convert English queries into complex code statements, while Unsloth improves training efficiency and memory usage. This synergy enhances code generation capabilities in Natural Language Processing (NLP) applications, fostering new techniques and improving software development efficiency.
A. Google Gemma is a family of open-source large language models (LLMs). These models are lighter variants of Google’s Gemini models and exhibit good Comprehension and Reasoning Abilities. Gemma’s capabilities span various tasks like question answering, code generation, and more.
A. Unsloth, an optimized library, accelerates and improves the efficiency of LLM fine-tuning. It provides great speed and memory improvements, making it the go-to choice for fine-tuning models like Gemma.
A. The Token Bender Code Instructions dataset contains instructions and corresponding code outputs in an Alpaca-style chat format.
A. The dataset is first converted into an Alpaca-style format, where each row includes an Instruct., an Input, and the desired code output. This format helps the model learn the relationship between natural language instructions and code.
A. Define several training arguments: Batch Size, Gradient Accumulation Steps, Learning Rate, and the number of training steps. These parameters control how the model learns from the data.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.