This article was published as a part of the Data Science Blogathon.
Dialogue Summarization: Its types and methodology
Image cc: Aseem Srivastava
Summarizing long pieces of text is a challenging problem. Summarization is done primarily in two ways: extractive approach and abstractive approach. In this work, we break down the problem of meeting summarization into extractive and abstractive components which further collectively generate a summary of the conversation.
What is Dialogue Summarization?
- We present a new novel approach to work on long summaries
- We beat the state-of-the-art (SOTA) results on the AMI meeting dataset.
Let’s Dive into the Methodology
The input to the encoder-decoder model is given in 512 length embedding which is generated after processing the AMI input from a BERT-based extractive summarizer where we apply k-means on a BERT sentence-level embedding. We use two variants of this approach, with and without fine-tuning.
Extractive Summarization Approach
Abstractive Summarization Approach
This is an incredibly difficult task that may seem impossible, even for people, and we don’t expect the model to solve it perfectly. However, It is a problematic task that encourages the model to get to know about language as well as facts about the world, as well as how to filter out information considered throughout the document to produce output that marginally relates to the fine-tuning our task. The pros of this self-supervision are that we could generate as many similar examples as there are documents, without any annotation, which is often the narrow section in supervised systems.
About the Metric: ROUGE 1 and ROUGE 2
These metrics are respectively based on unigram, bigram, and unigram overlap with a maximum skip distance of 4, and be highly correlated with human evaluations. ROUGE-2 scores can be seen as a measure of summary readability. Also, there is one opposite way to evaluate the performance of the approach and that is the human evaluation metric.
But it is slow and very expensive. Although, empirical studies show that the model’s performance judgment can be accurately done using ROGUE metric approaches in both automatic metrics and human evaluation. Table 1 gives a summary of all the experiments performed on the summarization task on the AMI dataset. The results clearly show that the results were improved by a considerable margin.
So in the end, let’s see if this work was new or something found on the first to last page of Google search!
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.