Fine-tuning, Retraining, and Beyond: Advancing with Custom LLMs

Sameer Mahajan 20 Oct, 2023 • 7 min read


I’m pretty sure most of you have already used ChatGPT. That’s great because you’ve taken your first step on a journey we’re about to embark on! You see, when it comes to mastering any new technology, the first thing you do is use it. It’s like learning to swim by jumping into the water!

You might have heard of the model consumers, tuners, and builders. But hang on, we’re about to break it down even further.

Advancing with Custom LLMs
Source: Author

McKinsey looks at it as takers, shapers, and makers which they mentioned in their GenAI Recognise session.

Source: McKinsey

We will take a closer look at each of these layers in this article.

Platform Proliferation as a Use Case

To dig even deeper into this, we’ll turn to a real-life example that’ll make everything crystal clear. In today’s tech landscape, it’s a given that most apps need to work on multiple platforms. However, here’s the catch: each platform has its unique interface and peculiarities. Extending support of an application for additional platforms and maintaining such multi-platform applications is equally challenging.

But that’s where GenAI swoops in to save the day. It empowers us to create a unified and user-friendly interface for our applications, regardless of the platforms they cater to. The magic ingredient? Large Language Models (LLMs) transform this interface into a natural and intuitive language.

Linux, Windows, Mac Commands

To make it more specific for even better understanding, let’s say we want to know what exact command to run for different scenarios on our machine which can be Linux, windows, or Mac. The following diagram illustrates one scenario:

Linux, Windows, Mac Commands
Source: Author

Value for End User as well as Application Developer

As an end user, you don’t have to learn/know commands for each of these platforms and can get your things done naturally and intuitively. As a developer of the application, you don’t have to explicitly translate each of the user-facing application interfaces into each of the underlying supported platforms.

Reference Architecture and Technologies

Source: Author

Several LLMs, including GPT3, GPT3.5, and GPT4, reside in the Cloud, courtesy of various providers such as Open AI and Azure Open AI. They are made easily accessible by various APIs like completion, chat completion, etc.

AI orchestrators make this access even more seamless and uniform across models and providers. That is the reason that GenAI applications these days typically interact with AI orchestrators instead of directly interacting with underlying providers and models. It then handle the orchestration with configurable and/or possibly multiple, underlying providers and models as required by the application.

You can have a plugin for each of the platforms your application wants to support for flexibility and modularity.  We will deep dive into all the things we can do with these plugins and orchestrators in the sections that follow.

Finally, the application has connector(s) to interact with platforms it wants to support to execute the commands generated by GenAI.

Reference Technologies

  • AI Orchestrators: LangChain, Semantic Kernel
  • Cloud Models: Azure Open AI

Configuring config.json from a Semantic Kernel Plugin

There are numerous in the configuration itself that you can tune to achieve the desired results. Here is a typical config.json from a semantic kernel plugin:


  "schema": 1,

  "description": "My Application",

  "type": "completion",

  "completion": {

    "max_tokens": 300,

    "temperature": 0.0,

    "top_p": 0.0,

    "presence_penalty": 0.0,

    "frequency_penalty": 0.0,

    "stop_sequences": [




  "input": {

    "parameters": [


        "name": "input",

        "description": "Command Execution Scenario",

        "defaultValue": ""





The ‘type’ specifies the API type you want to execute on the underlying LLM. Here we are using the “completion” API. The “temperature” determines the variability or creativity of the model. E.g. while you are chatting you may want AI to respond with different phrases at different times though they all may convey the same intent to keep the conversation engaging. Here however we always want the same precise answer. Hence we are using the value of 0. Your result might consist of different sections with some predefined separator(s) if you want only the first section, like the exact matching command in our case, to be output as a response you make use of “stop_sequences” like here. You define your input with all the parameters, only one in this case.

How to Leverage Prompt Engineering?

Now let’s dive into much talked about prompt engineering and how we can leverage it.

System Messages

System messages tell the model how exactly we want it to behave. E.g. the Linux bash plugin in our case might have something like the following at the beginning of its skprompt.txt

You are a helpful assistant that generates commands for Linux bash machines based on user input. Your response should contain ONLY the command and NO explanation. For all the user input, you will only generate a response considering the Linux bash commands to find its solution.

Which specifies its system message.

Few Shots Prompting and Examples

It helps the model to give the exact answer if you give it some examples of questions and corresponding answers that you are looking for. It is also called a few shots prompting. E.g. our Linux bash plugin might have something like the following in its skprompt.txt following the system message mentioned above:


User: Get my IP

Assistant: curl


User: Get the weather in San Francisco

Assistant: curl




You may want to tune your system to pick the right examples/shots that you desired result.

How to Manage AI Orchestration?

We will put together this configuration and prompt engineering in our simple example and see how we can manage AI orchestration in semantic kernel.

import openai

import os

import argparse

import semantic_kernel as sk

from import AzureTextCompletion

parser = argparse.ArgumentParser(description='GANC')

parser.add_argument('platform', type=str,

                    help='A platform needs to be specified')

parser.add_argument('--verbose', action='store_true',

                    help='is verbose')

args = parser.parse_args()

kernel = sk.Kernel()

deployment, api_key, endpoint = sk.azure_openai_settings_from_dot_env()

kernel.add_text_completion_service("dv", AzureTextCompletion(deployment, endpoint, api_key))

platformFunctions = kernel.import_semantic_skill_from_directory("./", "platform_commands")

platformFunction = platformFunctions[args.platform]

user_query = input()

response = platformFunction(user_query)

print (respone)

This Python script takes ‘platform’ as a required argument. It picks up the right plugin from the folder ‘platform_commands’ for the specified platform. It then takes the user query, invokes the function, and returns the response.

For your first few use cases, you may want to experiment only till here as LLMs already have a lot of intelligence. This simple configuration and prompt engineering alone can give results very close to your desired behavior and that too very quickly.

The following techniques are rather advanced at this time, require more effort and knowledge, and should be employed weighing in the return on investment. The technology is still evolving and maturing in this space. We will only take a cursory look at them at this time for completeness and our awareness of what lies ahead.

Fine-tuning LLM Models

Fine-tuning involves updating the weights of a pre-trained language model on a new task and dataset. It is typically used for transfer learning, customization, and domain specialization. There are several tools and techniques available for this. One way to do this is using OpenAI’s CLI tools. You can give it your data and generate training data for fine-tuning with commands like:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

Then you can create a custom model using Azure AI Studio:

Providing the fine-tuning data that you prepared earlier.

Building Custom LLMs

If you are brave enough to dive deeper and experiment further read on! We will look at how to build our custom models.

Retraining LLM Models

This is very similar to the fine-tuning that we saw earlier. Here is how we can do it using transformers:

from transformers import AutoTokenizer

# Prepare your data

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

def tokenize_function(examples):

    return tokenizer(examples["text"], padding="max_length", truncation=True)

# let's say my dataset is loaded into my_dataset

tokenized_datasets =, batched=True)

# load your model

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

# Train

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="mydir")


# save your model which can be loaded by pointing to the saved directory and used later


Training from Scratch

Here you can start with some known model structures and train them from scratch. It will take a lot of time, resources, and training data though the built model is completely in your control.

Brand New Models

You can define your model structure, potentially improving on existing models, and then follow the process above. Amazon’s Titan and Codewhisperer fall into this category.


GenAI holds immense potential for diverse use cases. This article exemplified its application in multi-platform support and quick solution building. While skepticism surrounds GenAI, the path to harnessing its power is clear. However, the journey becomes intricate when delving into model tuning and training.

Key Takeaways:

  • As you can see GenAI is very fascinating and enables several use cases.
  • We saw one such use case and looked at how we could quickly start building a solution.
  • Some wonder whether GenAI is a bubble. You can pick up your favorite use case and try it yourselves employing the steps laid out in this article to answer it for yourselves!
  • The process can get complex and laborious very quickly as you start entering into territories like model tuning, training and building.

Frequently Asked Questions

Q1. Is GenAI a bubble?

A. I don’t think so. You can pick up your favorite use case and try it yourselves employing the steps laid out in this article to answer it for yourselves!

Q2. What is the generative AI architecture stack?

A. There are end users, model consumers, model tuners, and model builders.

Q3. What are the three components of generative AI?

A. They are LLMs, providers, and AI orchestrators.

Q4. What is the infrastructure layer of generative AI?

A. It is the GPUs, TPUs, and cloud hosting services like Open AI, Azure Open AI, etc.

About Sameer Mahajan

Source: LinkedIn

I am a seasoned software engineer with over 27 years of industry experience, having worked with prominent companies in India and the United States. My educational background includes being an alumnus of the computer science departments at IIT Bombay and Georgia Tech.

Currently, I am the Principal Architect at GS Lab | GAVS. At GS Lab | GAVS, we are deeply involved in pioneering the Generative AI (GenAI) field. We’ve dedicated substantial effort to developing a structured approach to tackle the challenges in this exciting domain. I invite you to visit our website to stay updated with our latest endeavors and innovations. 

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Sameer Mahajan 20 Oct 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

  • [tta_listen_btn class="listen"]