Enhancing Conversational AI with BERT: The Power of Slot Filling

Shikha Sharma 17 Oct, 2023 • 6 min read


In the era of Conversational AI, chatbots and virtual assistants have become ubiquitous, revolutionizing how we interact with technology. These intelligent systems can understand user queries, provide relevant information, and assist with various tasks. However, achieving accurate and context-aware responses is a complex challenge. One crucial component that aids in this process is slot filling, and the advent of BERT (Bidirectional Encoder Representations from Transformers) has significantly improved its effectiveness. In this article, we will explore the role and implementation of BERT in slot-filling applications, unraveling how it enhances conversational AI systems.

Conversational AI
Source: ScienceDirect

Learning Objectives

  1. Understanding the concept and significance of slot filling in Conversational AI.
  2. To explore how BERT enhances slot filling by leveraging its contextual understanding and learn the steps to implement BERT for slot filling, from data preparation to fine-tuning.
  3. Discover the advantages of using BERT in Conversational AI, including improved user intent recognition.

This article was published as a part of the Data Science Blogathon.

What is Slot Filling?

Slot-filling is a vital task in task-oriented conversational systems. It involves extracting specific information, known as slots, from user queries. For example, the slots could include the departure city, destination, date, and class in a flight booking scenario. The extracted slot values are then used to generate appropriate responses and effectively fulfill the user’s request. Accurate slot filling is critical for understanding user intent and providing personalized and relevant responses.

Understanding slot filling for Conversational AI with BERT
Source: ResearchGate

The Power of BERT in Slot Filling

BERT’s contextual understanding and pre-training on vast amounts of text data make it a natural fit for slot-filling applications. By leveraging BERT’s capabilities, conversational AI systems can significantly improve their slot extraction accuracy and overall performance.

Here’s how BERT enhances slot filling:

  • Contextualized Representations: BERT captures the contextual information from the entire input sequence, allowing it to understand the relationships between words and phrases. This contextual understanding helps identify slot boundaries and distinguish between similar words or phrases in different contexts.
  • Ambiguity Resolution: User queries often contain ambiguous expressions or abbreviations that require disambiguation. BERT’s ability to grasp the contextual nuances aids in resolving such ambiguities, enabling accurate slot value extraction.
  • Out-of-Vocabulary (OOV) Handling: BERT’s vocabulary includes many words, but may encounter out-of-vocabulary terms. However, BERT’s subword tokenization approach allows it to handle OOV terms by breaking them into smaller subword units and representing them using subword embeddings.
  • Fine-Tuning for Slot Filling: BERT’s pre-trained representations can be fine-tuned on slot-filling datasets specific to a particular task or domain. This fine-tuning process adapts BERT to understand and extract slots according to the requirements of the conversational AI system, further improving its performance.

Implementation of BERT for Slot Filling

Let’s delve into implementing BERT for slot filling in conversational AI systems.

The following steps outline the process:

Step 1: Data Preparation

The first step involves preparing a labeled dataset for training BERT. The dataset consists of user queries annotated with slot labels. Each query is segmented into tokens and associated with corresponding slot labels. For instance, a query “Book a flight from New York to London” would be tokenized into [“Book,” “a,” “flight,” “from,” “New,” “York,” “to,” “London”] and labeled as [“O,” “O,” “O,” “O,” “B-from locate.city_name”, “B-to locate.city_name”, “O,” “O”].

Steps for Implementation of BERT for Slot Filling
Source: Link Springer

Step 2: BERT Tokenization

To convert tokenized queries into BERT’s input format, BERT uses WordPiece tokenization, which splits words into subword units. It assigns an index to each token and maps them to their corresponding subword embeddings.

Step 3: Model Architecture

The slot-filling model architecture typically consists of BERT as the base encoder, followed by a slot classification layer. BERT processes the tokenized input sequence and generates contextualized representations. These representations are then fed into a softmax layer that predicts the slot labels for each token.

Steps for Implementation of BERT for Slot Filling
Source: ResearchGate

Step 4: Fine-Tuning

The pre-trained BERT model is fine-tuned on the labeled slot-filling dataset. During fine-tuning, the model learns to optimize its parameters for the slot-filling task. The loss function is typically the cross-entropy loss, which measures the dissimilarity between predicted slot labels and the ground truth labels.

Steps for Implementation of BERT for Slot Filling
Source: PaperswithCode

Step 5: Inference

The fine-tuned BERT model is ready for inference after training. Given a user query, the model tokenizes, feeds it through BERT, and predicts the slot labels. The slot values can be extracted based on the predicted labels and used for generating appropriate responses.

Implementation in Python

Below is the code for implementing slot filling using BERT:

Step 1: Data Preparation

Prepare your labeled dataset for slot-filling

Step 2: BERT Tokenization

import torch

from transformers import BertTokenizer, BertForTokenClassification

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Step 3: Model Architecture

model = BertForTokenClassification.from_pretrained('bert-base-uncased', num_labels=num_labels)

# num_labels: number of slot labels

Step 4: Fine-Tuning

for epoch in range(num_epochs):


    total_loss = 0

    for batch in training_data:


        inputs = tokenizer(batch['text'], truncation=True, padding=True, return_tensors='pt')

        labels = torch.tensor(batch['labels']).unsqueeze(0)

        outputs = model(**inputs, labels=labels)

        loss = outputs.loss

total_loss += loss.item()



    print('Epoch:', epoch, 'Loss:', total_loss)

optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

Step 5: Inference


def predict_slots(query):

    inputs = tokenizer(query, truncation=True, padding=True, return_tensors='pt')

    with torch.no_grad():

        outputs = model(**inputs)

    logits = outputs.logits

    predicted_labels = torch.argmax(logits, dim=2).squeeze(0)

    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])

    slots = [tokenizer.convert_ids_to_tokens(pred.item())[2:] for pred in predicted_labels]

    results = []

    for token, slot in zip(tokens, slots):

if token == '[PAD]':


        results.append((token, slot))

    return results

Example Usage

query = "Book a flight from New York to London"

slots = predict_slots(query)

for token, slot in slots:

    print(token, '->', slot)

In the code snippet above, you can replace ‘bert-base-uncased’ with the appropriate BERT model name based on the requirements. Adjust the hyperparameters like learning_rate, num_epochs, and the training data format according to the specific dataset and setup. Customize the input and output formats to align with your dataset’s structure.

Remember to preprocess your labeled dataset and convert it into batches for training. The training_data variable in the code represents the input training data in batches.

The predict_slots function takes a user query, tokenizes it using the BERT tokenizer, and feeds it through the fine-tuned model. It then predicts the slot labels for each token and returns the results.


Slot filling is a fundamental component of conversational AI systems, enabling an accurate understanding of user intents and personalized responses. The integration of BERT has revolutionized slot-filling applications thanks to its contextual understanding, handling of ambiguity, OOV resolution, and fine-tuning capabilities.

Key takeaways:

  1. By leveraging BERT’s powerful representations and state-of-the-art NLP techniques, conversational AI systems can provide more accurate and context-aware responses, enhancing user experiences.
  2. As BERT continues to evolve and researchers explore novel techniques in conversational AI, we can expect further advancements in slot filling and other natural language understanding tasks.
  3. By harnessing the power of BERT and combining it with other components of conversational AI, we can look forward to more intelligent and intuitive chatbots and virtual assistants that cater to our needs with remarkable precision.

Frequently Asked Questions

Q1: What is slot filling in Conversational AI, and why is it important?

A. Slot filling is extracting specific pieces of information, known as slots, from user queries in Conversational AI systems. It is essential because accurate slot filling helps understand user intent and enables personalized and context-aware responses. The system can provide relevant and precise information by extracting slot values such as dates, locations, or preferences.

Q2: How does BERT improve slot filling in Conversational AI?

A. BERT (Bidirectional Encoder Representations from Transformers) enhances slot filling by leveraging its contextual understanding and pre-training on vast text data. BERT’s contextualized representations capture the relationships between words and phrases, aiding in accurate slot boundary identification and disambiguation. Handling ambiguity, out-of-vocabulary terms, and fine-tuning capabilities further improve slot-filling performance.

Q3: Can BERT handle multiple slots and complex queries in slot filling?

A. Yes, BERT can handle multiple slots and complex queries effectively. It comprehends the contextual nuances within the input sequence, enabling accurate extraction of multiple slots simultaneously.

Q4: How is slot filling implemented with BERT in Conversational AI?

A. Implementing slot filling with BERT involves several steps. First, we prepare a labeled dataset with user queries and corresponding slot labels. Next, we apply BERT’s tokenization to convert the queries into their input format. We construct a model architecture with BERT as the base encoder, followed by a slot classification layer. Then, we fine-tune the model on the labeled dataset. During inference, BERT tokenizes the user query, predicts slot labels, and extracts slot values for generating appropriate responses.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Shikha Sharma 17 Oct 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

  • [tta_listen_btn class="listen"]