Adapting BERT Through Fine-tuning For Downstream Tasks

ANURAG SINGH CHOUDHARY 03 Aug, 2023 • 6 min read


Adapting BERT for downstream tasks entails utilizing the pre-trained BERT model and customizing it for a particular task by adding a layer on top and training it on the target task. This technique allows the model to learn dependent on the task details from the data used for training while drawing on the knowledge of broad language expression of the pre-trained BERT model. Use the hugging face transformers package in Python to fine-tune BERT. Describe your training data, incorporating input text and labels. Fine-tuning the pre-trained BERT model for downstream tasks according to your data using the fit() function from the BertForSequenceClassification class.

Learning Objectives

  1. The objective of this article is to delve into the fine-tuning of BERT.
  2. A thorough analysis will highlight the benefits of fine-tuning for downstream Tasks.
  3. The operational mechanism of downstream will be comprehensively elucidated.
  4. A full sequential overview will be provided for fine-tuning BERT for downstream activities.

This article was published as a part of the Data Science Blogathon.

How BERT Undergoes Fine-Tuning?

Fine-tuning BERT adapts a pre-trained model with training data from the desired job to a specific downstream task by training a new layer. This process empowers the model to gain task-specific knowledge and enhance its performance on the target task.

Primary steps in the fine-tuning process for BERT

1: Utilize the hugging face transformers library to load the pre-trained BERT model and tokenizer.

import torch

# Choose the appropriate device based on availability (CUDA or CPU)
gpu_available = torch.cuda.is_available()
device = torch.device("cuda" if gpu_available else "cpu")

# Utilize a different tokenizer
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Load the model using a custom function
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased')

2: Specify the training data for the specific target task, encompassing the input text and their corresponding labels

# Specify the input text and the corresponding labels
input_text = "This is a sample input text"
labels = [1]

3: Utilize the BERT tokenizer to tokenize the input text.

# Tokenize the input text
input_ids = torch.tensor(tokenizer.encode(input_text)).unsqueeze(0)

4: Put the model in training mode.

# Set the model to training mode

Step 5: For obtaining fine-tuning of the pre-trained BERT model, we use the method of  BertForSequenceClassification class. it includes training a new layer of pre-trained BERT model with the target task’s training data.

# Set up your dataset, batch size, and other training hyperparameters
dataset_train = ...
lot_size = 32
num_epochs = 3
learning_rate = 2e-5

# Create the data loader for the training set
train_dataloader = torch.
batch_size=lot_size), num_epochs=num_epochs, learning_rate=learning_rate)

Step 6: Investigate the fine-tuned BERT model’s illustration on the specific target task.

# Switch the model to evaluation mode

# Calculate the logits (unnormalized probabilities) for the input text
with torch.no_grad():
    logits = model(input_ids)

# Use the logits to generate predictions for the input text
predictions = logits.argmax(dim=-1)

accuracy = ...

These represent the primary steps involved in fine-tuning BERT for a downstream task. You can utilize this as a foundation and customize it according to your specific use case.

Fine-tuning BERT enables the model to acquire task-specific information, enhancing its performance on the target task. It proves particularly valuable when the target task involves a relatively small dataset, as fine-tuning with the small dataset allows the model to learn task-specific information that might not be attainable from the pre-trained BERT model alone.

Which Layers Undergo Modifications During Fine-tuning?

During fine-tuning, solely the weights of the supplementary layer appended to the pre-trained BERT model undergo updates. The weights of the pre-trained BERT model remain fixed. Thus only the added layer experiences modifications throughout the fine-tuning process.

Typically, the attached layer functions as a classification layer proceeds the pre-trained BERT model results, and generates logits for each class in the end task. The target task’s training data trains the added layer, enabling it to acquire task-specific information and improve the model’s performance on the target task.

To sum up, during fine-tuning, the added layer above the pre-trained BERT model undergoes modifications. The pre-trained BERT model maintains fixed weights. Thus, only the added layer is subject to updates during the training process.

Downstream Tasks

Downstream tasks include a variety of natural language processing (NLP) operations that use pre-trained language reconstruction models such as BERT. Several examples of these tasks are below.

Text Classification

Text classification involves the assignment of a text to predefined categories or labels. For instance, one can train a text classification model to categorize movie reviews as positive or negative.

Use the BertForSequenceClassification library to alter BERT for text classification. This class uses input data, such as words or paragraphs, to generate logits for every class.

Adapting BERT | Fine-tuning | Downstream tasks

Natural Language Inference

Natural language inference, also called recognizing textual entailment (RTE), determines the relationship between a given premise text and a hypothesis text. To adapt BERT for natural language inference, you can use the BertForSequenceClassification class provided by the hugging face transformers library. This class accepts a pair of premise and hypothesis texts as input and produces logits (unnormalized probabilities) for each of the three classes (entailment, contradiction, and neutral) as output.

Adapting BERT | Fine-tuning | Downstream tasks

Named Entity Recognition

The Named Entity Recognition process includes finding and dividing items defined in the text, such as people and Locations. The hugging face transformers library provides the BertForTokenClassification class to fine-tune BERT for named entity recognition. The provided class takes the input text and generates logits for each token in the input text, indicating the token’s class.

Adapting BERT | Fine-tuning | Downstream tasks


Answering questions involves generating a response in human language based on the given context. To fine-tune BERT for question answering, you can use the BertForQuestionAnswering class offered by the hugging face transformers library. This class takes both a context and a question as input and provides the start and end indices of the answer within the context as output.

Researchers continuously explore novel ways to utilize BERT and other language representation models in various NLP tasks. Pre-trained language representation models like BERT enable the accomplishment of various downstream tasks, such as the above examples. Apply fine-tuned BERT models to numerous other NLP tasks as well.

Adapting BERT | Fine-tuning | Downstream tasks


When BERT is fine-tuned, a pre-trained BERT model is arranged to a particular job or domain by updating its bounds using a limited amount of labeled data. For example, fine-tuning requires a dataset containing texts and their respective sentiment labels when utilizing BERT for sentiment analysis. This typically entails incorporating a task-specific layer atop the BERT encoder and training the entire model end-to-end, employing an appropriate loss function and optimizer.

Key Takeaways

  • Utilizing fine-tuning techniques on adapting BERT for downstream tasks generally employed succeeds in enhancing the productivity of natural language processing models on specific tasks.
  • The process involves adapting the pre-trained BERT model to a particular task by training a new layer on top of the pre-trained model using the target task’s training data. This enables the model to acquire task-specific knowledge and improve its performance on the target task.
  • In general, fine-tuning BERT may be an effective method for increasing NLP model efficiency on certain tasks.
  • It allows the model to utilize the pre-trained BERT model’s understanding of general language representation while acquiring task-specific information from the target task’s training data.

Frequently Asked Questions

Q1. What does fine-tuning a BERT model mean?

A. Fine-tuning involves training specific parameters or layers of a pre-existing model checkpoint with labeled data from a specific task. This checkpoint is usually a model pre-trained on vast amounts of text data using unsupervised masked language modeling (MLM).

Q2. What is fine-tuning BERT for downstream tasks?

A. During the fine-tuning step, we adjust the already trained BERT model to a specific downstream task by putting a new layer on top of the previously trained model and training it using training data from the target task. This enables the model to acquire task-specific knowledge and enhance its performance on the target task.

Q3. Does fine-tuning improve accuracy?

A. Yes, it increases the model’s accuracy. It comprises using a model that has already been trained and retraining it using data pertinent to the original goal.

Q4. What are the main tasks that BERT is optimized for?

A. Due to the Bidirectional Capabilities of BERT, BERT undergoes pre-training on two different NLP tasks: Next Sentence Prediction and Masked Language Modeling.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion. 

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

  • [tta_listen_btn class="listen"]