Comparison of Text Generations from GPT and GPT-2

Drishti Sharma 04 Dec, 2022 • 4 min read

This article was published as a part of the Data Science Blogathon.

GPT and GPT-2

 Source: Canva


The real-world data can be very messy and skewed, which can mess up the effectiveness of the predictive model if it is not addressed correctly and in time.

The consequences of skewness become more pronounced when a large model is trained on a skewed dataset, and it is often not practical to retrain that model from scratch. Besides that, if those models are placed into production immediately, we must be ready for the implications.

This article will test the genre skewness of GPT and GPT-2 models. I came across this interesting stuff while going through NLP with Transformers book (which I heartily recommend), so I thought of documenting my own experience and sharing it with you all.

Now, let’s begin!

Task Overview

We will make use of GPT (openai-gpt) and GPT-2 pre-trained models from the Hugging Face hub. We will also use Hugging Face’s text-generation pipeline to detect if skewness (due to over or under-representation) is evident in GPT and GPT-2 text generations. 

Datasets Used for Training GPT and GPT-2

GPT is trained on the BooksCorpus dataset, which consists of about 7000 unpublished books, while GPT-2 was trained on WebText, which is linked to Reddit.

But before we compare, let’s make sure that the two models we are comparing have the same model size in order to have a fair comparison. 

Ensuring That we Are Comparing Similar-Sized Versions of Both Models

For this, first off, we will install transformers and import the necessary libraries.

!pip install transformers
from transformers import pipeline, set_seed

Next, we will define the name of the models we will use for drawing comparison.

model_name1 = “openai-gpt”
model_name2 = “gpt2”

Following that, we will set up a pipeline for the text-generation task for each model.

text_generation_gpt = pipeline(“text-generation”, model = model_name1)
text_generation_gpt2 = pipeline(“text-generation”, model = model_name2)

Now, we will define a model for calculating the number of parameters in each model.

def model_size(model):
  return sum(params.numel() for params in model.parameters())

Printing the number of parameters in GPT and GPT-2.

print(f"Number of Parameters in GPT: {model_size(text_generation_gpt.model)/1000**2:.1f}M parameters")
print(f"Number of Parameters in GPT-2: {model_size(text_generation_gpt2.model)/1000**2:.1f}M parameters")

>> Output: 

Number of Parameters in GPT: 116.5M parameters
Number of Parameters in GPT-2: 124.4M parameters

Hence, both of these models are similar-sized versions.

Comparison of Text Generated by GPT and GPT-2

Now we will define a function to generate completions from each model.

def enum_pipeline_outputs(pipe, prompt, num_return_sequences):
  out = pipe(prompt, num_return_sequences = num_return_sequences, clean_up_tokenization_spaces = True)
  return "n".join(f"{i+1}." + s["generated_text"] for i,s in enumerate(out))

We will use a prompt for generating four text completions to draw comparisons between the generated text from both models.

prompt = "Before they left for the supermarket"

I) Generating four output text completions for GPT

print("Text Generated by GPT for the given prompt:n" + enum_pipeline_outputs(text_generation_gpt, prompt, 4))

>> Output of GPT model:

Text Generated by GPT for the given prompt:
1.Before they left for the supermarket. 
 as she was preparing a pot of coffee the telephone rang. she put it to her ear. " hi, it's me. " 
 " you've got a visitor. we got the new computer i'm
2.Before they left for the supermarket. " but since he was still holding her captive, and he hadn't released her yet, she didn't understand why he felt the need to keep all her plans a secret from her. 
 he let go of the
3.Before they left for the supermarket. " 
 i was shocked. " he's... he's not in love with you. " 
 " he never was. he never will be again. it's over and over. this is the end for both
4.Before they left for the supermarket. i've already eaten breakfast now and i think i 'll put in a few hours in the gym this morning just to give myself time to go to the bathroom and clean up and get the better of it, but i

II) Generating four output text completions for GPT-2

print("Text Generated by GPT-2 for the given prompt:n" + enum_pipeline_outputs(text_generation_gpt2, prompt, 4))

>> Output of GPT-2 model:

Text Generated by GPT-2 for the given prompt:
1. Before they left for the supermarket, the family returned to the warehouse to check on them. According to the police, there were three suspicious items on the shelves and an object that looked like a toy or a piece of glass.
2. Before they left for the supermarket, Gai said that when he first came up in this world, it was like, “I don’t know, the world is coming to me, but it’s not coming from the home.” That made me feel more alive
3. Before they left for the supermarket, he opened the door and opened the door a little deeper. When they stopped, he said, they made a couple of attempts to get away – and I said my name just so I could hear them – then one
4. Before they left for the supermarket, I knew that it was impossible to see the other side of the house and that it was just as bad as the pictures make it sound. At the supermarket, there was a little window leading out onto a very small street and

Observation: So by comparing just a handful of GPT and GPT-2 outputs, we can clearly sense some genre skewness toward romance from the text outputs produced by GPT! Moreover, this highlights our challenges while creating a large text corpus. Also, the biases in the model’s behavior need to be considered when it comes to the target audience interacting with the model.


This article presents a comparison of text generations from GPT and GPT-2 to test if genre skewness is evident in the text outputs generated by both models, i.e., GPT and GPT-2.

To summarize, the key takeaways from this article are:

1. In GPT, there’s a Genre skew toward “romance” due to a strong overrepresentation of romance novels in BookCorpus. It often imagines a romantic interaction between a man and a woman.

2. GPT-2 was trained on data from Reddit. Hence it mostly adopts the neutral “they” in its text generations which has blog-like or adventure-like elements.

3. The results highlight the challenges we can face and which should rather be addressed while creating a large text corpus. Moreover, the biases in the behavior of the model need to be considered when it comes to the target audience interacting with the model.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Drishti Sharma 04 Dec 2022

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers