Exploring the Extractive Method of Text Summarization

Shilpi Mazumdar 04 Apr, 2023 • 14 min read

Introduction

Often there are many situations where we don’t have/get enough time to read and understand lengthy documents, research papers, or news articles. Similarly, summarizing a large volume of text while retaining essential information is crucial in many fields, such as journalism, research, and business. This is where NLP text summarization comes into play, which is a technique that automatically generates a condensed version of a given text while preserving its essential meaning. In this article, we will explore the two main approaches of NLP text summarization, namely extractive and abstractive, and examine their applications, strengths, and weaknesses.

Learning Objectives

In this article, you will:

  1. Understand the different categories of text vectorization.
  2. Understanding extractive and abstraction approach through examples.
  3. Learn the difference between both vectorization techniques.
  4. And the future aspects of text summarization.

Table of Contents

Types of Text Summarization

Broadly, the NLP text summarization can be divided into two main categories.

  • Extractive Approach
  • Abstractive Approach

Let’s dive a little deeper into each of the above-mentioned categories.

Extractive Summarization

So, what exactly happens in the extractive summarization method? It simply takes out the important sentences or phrases from the original text and joins them to form a summary.

Now, the question that comes is, exactly on what basis are those sentences termed as important? So, basically, a ranking algorithm is used, which assigns scores to each of the sentences in the text based on their relevance to the overall meaning of the document. The most relevant sentences are then chosen to be included in the summary.

There are various ways through which the ranking of sentences can be performed.
TF-IDF (term frequency-inverse document frequency)
Graph-based methods such as TextRank
Machine learning-based methods such as Support Vector Machines (SVM) and Random Forests.

The main motive of the extractive method is to maintain the original meaning of the text. Also, this method works well when the input text/content is already in a well-structured manner, both physically and logically, just like the content in newspapers.

Abstractive Summarization

Okay, now let’s come to the abstractive summarization method. The name itself implies that it has arrived from the root form of the word abstract, which means outline/summary or the basic idea of a voluminous thing(text). Now unlike the extractive method, it simply doesn’t pick out the important sentences, rather, it analyses the input text and generates new phrases or sentences that capture the essence of the original text and convey the same meaning as the original text but more concisely and coherently.

Again, how exactly is the summary generated in this method? So, in brief, the input text is analyzed by a neural network model that learns to generate new phrases and sentences that capture the essence of the original text. The model is trained on large amounts of text data and learns to understand the relationships between words and sentences, and generates new text that conveys the same meaning as the original text in a more understandable manner.

This method uses advanced NLP techniques such as natural language generation (NLG) and deep learning to understand the context and generate the summary. The resulting summaries are usually shorter and more readable than the ones generated by the extractive method, but they can sometimes contain errors or inaccuracies.

Note that, here in this article, we’ll only deal with the extractive text summarization method.

Understanding with Code

Here, we’ll focus on the extractive method and understand it more with an example.

But, before that, let’s quickly understand it with a flowchart.

Here, we will use a Python library called NLTK (Natural Language Toolkit) to implement the extractive method. NLTK provides a wide range of functionalities for natural language processing, including text tokenization, stopword removal, and sentence scoring.

Let’s take a look at the following code that demonstrates how to use NLTK to generate a summary from a given text:

Frequency-based Approach

# import the required libraries
import nltk
nltk.download('punkt') # punkt tokenizer for sentence tokenization
nltk.download('stopwords') # list of stop words, such as 'a', 'an', 'the', 'in', etc, which would be dropped
from collections import Counter # Imports the Counter class from the collections module, used for counting the frequency of words in a text.
from nltk.corpus import stopwords # Imports the stop words list from the NLTK corpus
# corpus is a large collection of text or speech data used for statistical analysis

from nltk.tokenize import sent_tokenize, word_tokenize # Imports the sentence tokenizer and word tokenizer from the NLTK tokenizer module. 
# Sentence tokenizer is for splitting text into sentences
# word tokenizer is for splitting sentences into words

# this function would take 2 inputs, one being the text, and the other being the summary which would contain the number of lines
def generate_summary(text, n):
# Tokenize the text into individual sentences
sentences = sent_tokenize(text)

# Tokenize each sentence into individual words and remove stopwords
stop_words = set(stopwords.words('english'))
# the following line would tokenize each sentence from sentences into individual words using the word_tokenize function of nltk.tokenize module
# Then removes any stop words and non-alphanumeric characters from the resulting list of words and converts them all to lowercase
words = [word.lower() for word in word_tokenize(text) if word.lower() not in stop_words and word.isalnum()]

# Compute the frequency of each word
word_freq = Counter(words)

# Compute the score for each sentence based on the frequency of its words
# After this block of code is executed, sentence_scores will contain the scores of each sentence in the given text, 
# where each score is a sum of the frequency counts of its constituent words

# empty dictionary to store the scores for each sentence
sentence_scores = {}

for sentence in sentences:
sentence_words = [word.lower() for word in word_tokenize(sentence) if word.lower() not in stop_words and word.isalnum()]
sentence_score = sum([word_freq[word] for word in sentence_words])
if len(sentence_words) < 20:
sentence_scores[sentence] = sentence_score

# checks if the length of the sentence_words list is less than 20 (parameter can be adjusted based on the desired length of summary sentences)
# If condition -> true, score of the current sentence is added to the sentence_scores dictionary with the sentence itself as the key
# This is to filter out very short sentences that may not provide meaningful information for summary generation

# Select the top n sentences with the highest scores
summary_sentences = sorted(sentence_scores, key=sentence_scores.get, reverse=True)[:n]
summary = ' '.join(summary_sentences)

return summary

Using a Sample Text From Wikipedia to Generate Summary

text = '''
Weather is the day-to-day or hour-to-hour change in the atmosphere. 
Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more. 
Energy from the Sun affects the weather too. 
Climate tells us what kinds of weather usually happen in an area at different times of the year. 
Changes in weather can affect our mood and life. We wear different clothes and do different things in different weather conditions. 
We choose different foods in different seasons.
Weather stations around the world measure different parts of weather. 
Ways to measure weather are wind speed, wind direction, temperature and humidity. 
People try to use these measurements to make weather forecasts for the future. 
These people are scientists that are called meteorologists. 
They use computers to build large mathematical models to follow weather trends.'''

summary = generate_summary(text, 5)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

Output

The following output is what we would be getting as a summary. This summary would contain 5 sentences.

We wear different clothes and do different things in different weather conditions.
Weather stations around the world measure different parts of weather.
Climate tells us what kinds of weather usually happen in an area at different times of the year.
Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more.
Ways to measure weather are wind speed, wind direction, temperature and humidity.

What’s happening in the above code?
So, the above code takes a text and a desired number of sentences for the summary as input and returns a summary generated using the extractive method. The method first tokenizes the text into individual sentences and then tokenizes each sentence into individual words. Stopwords are removed from the words, and then the frequency of each word is computed.

Then the score for each sentence is computed based on the frequency of its words, and the top n sentences with the highest scores are selected to form the summary. Finally, the summary is generated by joining the selected sentences together.

In the next section, we will explore how the extractive method can be further improved using advanced techniques such as TF-IDF.

TF-IDF Approach

# importing the required libraries

# importing TfidfVectorizer class to convert a collection of raw documents to a matrix of TF-IDF features.
from sklearn.feature_extraction.text import TfidfVectorizer

# importing cosine_similarity function to compute the cosine similarity between two vectors.
from sklearn.metrics.pairwise import cosine_similarity

# importing nlargest to return the n largest elements from an iterable in descending order.
from heapq import nlargest

def generate_summary(text, n):
# Tokenize the text into individual sentences
sentences = sent_tokenize(text)

# Create the TF-IDF matrix
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(sentences)

# Compute the cosine similarity between each sentence and the document
sentence_scores = cosine_similarity(tfidf_matrix[-1], tfidf_matrix[:-1])[0]

# Select the top n sentences with the highest scores
summary_sentences = nlargest(n, range(len(sentence_scores)), key=sentence_scores.__getitem__)

summary_tfidf = ' '.join([sentences[i] for i in sorted(summary_sentences)])

return summary_tfidf

Using a Sample Text to Check the Summary

text = '''
Weather is the day-to-day or hour-to-hour change in the atmosphere. 
Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more. 
Energy from the Sun affects the weather too. 
Climate tells us what kinds of weather usually happen in an area at different times of the year. 
Changes in weather can affect our mood and life. We wear different clothes and do different things in different weather conditions. 
We choose different foods in different seasons.
Weather stations around the world measure different parts of weather. 
Ways to measure weather are wind speed, wind direction, temperature and humidity. 
People try to use these measurements to make weather forecasts for the future. 
These people are scientists that are called meteorologists. 
They use computers to build large mathematical models to follow weather trends.'''

summary = generate_summary(text, 5)
summary_sentences = summary.split('. ')
formatted_summary = '.\n'.join(summary_sentences)

print(formatted_summary)

The following output is what we would be getting as a summary. This summary would contain 5 sentences.

Energy from the Sun affects the weather too.
Changes in weather can affect our mood and life.
We wear different clothes and do different things in different weather conditions.
Weather stations around the world measure different parts of the weather.
People try to use these measurements to make weather forecasts for the future.

The above code generates a summary for a given text using a tf idf approach. A function to generate a summary that takes a text parameter and an n parameter(number of sentences in summary). The function tokenizes the text into individual sentences, creates a TF-IDF matrix using the TfidfVectorizer class, and computes the cosine similarity between each sentence and the document using the cosine_similarity function.
Next, the function selects the top n sentences with the highest scores using the nlargest function from the heapq library and joins them into a string using the join method.

Okay, before moving further, let’s quickly understand the cosine similarity. You can jump to the next part if you are already familiar with this.

So, the cosine similarity considers the angle between the vectors of word frequencies for each document rather than just their magnitudes. This means that documents with similar word frequencies and distributions will have a smaller angle between their vectors and, thus a higher cosine similarity score. Let’s understand this with a simple example.

We have two sentences.

  1. “I love cats and dogs.”
  2. “I love only cats.”

We first need to convert each sentence into a vector representation to calculate the similarity between these two sentences using cosine similarity with TF-IDF. Here’s how we can do that:

  1. “I love cats and dogs.” -> [1, 1, 1, 1, 0, 0]
  2. “I love only cats.” -> [1, 1, 1, 0, 1, 0]

How are we getting the vector representation? We need to perform the following steps.
1. Break the sentence into individual words -> tokenization:

  • “I love cats and dogs.” -> [‘I’, ‘love’, ‘cats’, ‘and’, ‘dogs’, ‘.’]
  • “I love only cats.” -> [‘I’, ‘love’, ‘only’, ‘cats’, ‘.’]

2. Now, Create a vocabulary of unique words from both sentences:
[‘I’, ‘love’, ‘cats’, ‘and’, ‘dogs’, ‘.’, ‘only’] 3. Now convert each sentence into a binary vector of size equal to the vocabulary, where 1 represents the presence of the word in the sentence and 0 represents its absence.
“I love cats and dogs.” -> [1, 1, 1, 1, 1, 1, 0]
Explanation:
‘I’ is present, hence 1
‘love’ is present, hence 1
‘cats’ is present, hence 1
‘and’ is present, hence 1
‘dogs’ is present, hence 1
‘.’ is present, hence 1
‘only’ is absent, hence 0
“I love only cats.” -> [1, 1, 1, 0, 0, 1, 1]
Explanation:
‘I’ is present -> 1
‘love’ is present -> 1
‘cats’ is present -> 1
‘and’ is absent -> 0
‘dogs’ is absent -> 0
‘.’ is present -> 1
‘only’ is present -> 1
Each vector has six elements corresponding to the six unique words in the sentences. The values in each vector represent the frequency of each word in its respective sentence.

Next, we compute the TF-IDF weights for each word in both sentences. Let’s assume all words’ inverse document frequency (IDF) is the same for simplicity. Then, the weights are:

“I love cats and dogs.” -> [0.0, 0.0, 0.0, 0.0, 0.0, 0.0] “I love only cats.” -> [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

Since each word occurs in both sentences, their IDF values are zero, making the TF-IDF weights for each word also zero.

Finally, we compute the cosine similarity between the two vectors using the formula:

cosine_similarity = (v1 . v2) / (||v1|| * ||v2||)

where v1 and v2 are the vector representations of the sentences, and ‘.’ denotes the dot product of two vectors. ||v1|| and ||v2|| are the Euclidean norms of the two vectors.

Using the vector representations and the formula above, the cosine similarity between the two sentences is:

The dot product of the vectors [1, 1, 1, 1, 1, 1, 0] and [1, 1, 1, 0, 0, 1, 1] is:

1*1 + 1*1 + 1*1 + 1*0 + 1*0 + 1*1 + 0*1 = 4

The magnitude (or Euclidean length) of the first vector [1, 1, 1, 1, 1, 1, 0] is:
sqrt(1^2 + 1^2 + 1^2 + 1^2 + 1^2 + 1^2 + 0^2) = sqrt(6) -> 2.44

Similarly, the magnitude for the second vector [1, 1, 1, 0, 0, 1, 1] is:
sqrt(1^2 + 1^2 + 1^2 + 0^2 + 0^2 + 1^2 + 1^2) = sqrt(5) -> 2.23

Therefore, the cosine similarity between the two sentences is:

cosine_similarity = 4 / (2.44 * 2.23) => 4 / 5.4412 = 0.74 (approx)
This indicates that the two sentences are somewhat similar but not very similar.

Evaluation Metrics

Let’s now check how well our approach is working. I got this particular text from this link.
Following is the text.

Weather is the day-to-day or hour-to-hour change in the atmosphere. Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more. Energy from the Sun affects the weather too. Climate tells us what kinds of weather usually happen in an area at different times of the year. Changes in weather can affect our mood and life. We wear different clothes and do different things in different weather conditions. We choose different foods in different seasons.

Weather stations around the world measure different parts of the weather. Ways to measure weather are wind speed, wind direction, temperature and humidity. People try to use these measurements to make weather forecasts for the future. These people are scientists that are called meteorologists. They use computers to build large mathematical models to follow weather trends.

How can we check the accuracy of the above text’s summary when we generate one? So, one way is to use human evaluation as the ground truth. In this approach, we can generate summaries using each method (frequency-based, TF-IDF), and then ask human evaluators to rate the quality of each summary based on different criteria such as coherence, readability, and relevance to the original text. We can then calculate the average score for each method based on the ratings given by the evaluators. This will give us a quantitative measure of the performance of each method.

Another approach is to use ROUGE (Recall-Oriented Understudy for Gisting Evaluation), which is a commonly used metric for evaluating text summarization models. ROUGE measures the overlap between the generated and reference summaries (i.e., the ground truth).

Let’s first go with the human evaluation method.

We got the following summary(5 sentences) as the output using the frequency-based approach.

We wear different clothes and do different things in different weather conditions.
Weather stations around the world measure different parts of the weather.
Climate tells us what kinds of weather usually happen in an area at different times of the year.
Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more.
Wind speed, direction, temperature, and humidity are ways to measure weather.

We got the following summary(5 sentences) as the output using the TF-IDF approach.

Energy from the Sun affects the weather too.
Changes in weather can affect our mood and life.
We wear different clothes and do different things in different weather conditions.
Weather stations around the world measure different parts of the weather.
People try to use these measurements to make weather forecasts for the future.

The average rating human evaluators rated the frequency-based approach as ⅘ and the TF-IDF approach as ⅗

So, as per human evaluation, the frequency-based approach works better.

Now, let’s see how the machine evaluates.

Let’s see the evaluation using ROUGE. The following has a reference summary, which is human-generated, and we will check how well the artificially generated summary is as compared to the human-generated summary.

# in case it's not installed onto your system.
! pip install rouge

import rouge
from rouge import Rouge
# a defined function called evaluate_rouge taking two arguments, 
# one being reference text and the other summary text, 
# and uses the ROUGE metric to evaluate the quality of the summary text compared to the reference text.
# The function uses the rouge library to compute the ROUGE scores and returns the F1 score of the ROUGE-1 metric.
def evaluate_rouge(reference_text, summary_text):
rouge = Rouge()
scores = rouge.get_scores(reference_text, summary_text)
return scores[0]['rouge-1']['f']


# the following is a human generated summary
reference_summary = '''
Weather is a gradual slow change through days and hours in the atmosphere and can vary from wind to snow. 
Climate tells a lot about the weather in an area.
The livelihood of people changes according to the change in weather.
Weather stations measure different parts of weather.
People who use measurements to make weather forecasts for the future are called meteorologists, and are scientists.'''

# the sample text from Wikipedia
text = '''
Weather is the day-to-day or hour-to-hour change in the atmosphere. 
Weather includes wind, lightning, storms, hurricanes, tornadoes (also known as twisters), rain, hail, snow, and lots more. 
Energy from the Sun affects the weather too. 
Climate tells us what kinds of weather usually happen in an area at different times of the year. 
Changes in weather can affect our mood and life. We wear different clothes and do different things in different weather conditions. 
We choose different foods in different seasons.
Weather stations around the world measure different parts of weather. 
Ways to measure weather are wind speed, wind direction, temperature and humidity. 
People try to use these measurements to make weather forecasts for the future. 
These people are scientists that are called meteorologists. 
They use computers to build large mathematical models to follow weather trends.'''

# Generate summary using frequency-based/TF-IDF approach
summary = generate_summary(text, 5)

# Evaluate the summary using ROUGE
rouge_score = evaluate_rouge(reference_summary, summary)

print(f"ROUGE score: {rouge_score}")

# For frequency based approach we are getting a score of 0.336
# For TF-IDF approach we are getting a score of 0.465

Here, a reference summary and a text are defined. Then, a summary is generated from the text using the frequency-based approach and then the tf-idf approach. Next, the ROUGE score of the generated summary is evaluated against the reference summary using the evaluate_rouge() function. The ROUGE score measures the similarity between the generated and reference summaries. The higher the ROUGE score, the more similar the two summaries are.

Now, here for the frequency-based approach, we get a score of 0.336; using the TF-IDF approach, we get a score of 0.465. So, in this evaluation method, the TF-IDF approach works better.

Comparison of Extractive and Abstractive Text Summarization

Text Summarization

Future Outlook of Text Summarization

The future of this particular field finds its way on the higher steps of the technology ladder as every day, new techniques and ways are being explored by the R&D teams. The use of machine learning and NLP will gradually improve the quality and accuracy of the summaries that will be generated.

This field also includes the usage of deep learning models, such as recurrent neural networks and transformers, hence leading to a better understanding of what exactly the text is about. Additionally, more advancements in language generation techniques will lead to the development of more sophisticated abstractive summarization methods.

Ultimately the advanced solutions would help us save time, increase productivity, and make information more accessible and easily digestible.

Conclusion

Text summarization is a fast-growing field in natural language processing, and it has the potential to revolutionize the way we consume and process information. In this article, we covered

  • Extractive summarization techniques select and combine existing sentences from a text to create a summary. In contrast, abstractive techniques generate new sentences while keeping the essence of the original text intact.
  • Extractive summarization has advantages over abstractive summarization, where some of them have higher accuracy, lower computational complexity, and better preservation of factual information.
  • Abstractive summarization has advantages over extractive summarization, including the ability to create more concise and coherent summaries and also the potential to capture the overall meaning of a text.
  • Text summarization has many real-world applications, including journalism, finance, healthcare, and the legal industry.
  • As the amount of digital information grows, text summarization will become an essential tool for efficient processing and making sense of large volumes of text.
Shilpi Mazumdar 04 Apr 2023

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear