Memory Networks for Q&A(Question and Answer) Applications
This article was published as a part of the Data Science Blogathon
Introduction
Table of Contents
- What is the motivation behind Memory Networks?
- Why do we need Memory Networks when traditional NLP models are already performing well?
- Facebook bAbI dataset
- About Supporting Fact
- Components of Memory Networks
- How can we find the best match?
- How does the dot product find the matching?
- Sample QA application
- Endnotes
What is the motivation behind Memory Networks?
Why do we need Memory Networks when traditional NLP models are already performing well?
Facebook bAbI dataset

Source: The Facebook bAbI project , dataset
We are going to focus on only two tasks out of the 20 tasks mentioned above. These tasks can’t respond in sentences.
- The theme of the task is asking for the location of a person who is in action.
- This task indicates the answer from a previously given sentence.
- The theme of the task is asking for the location of a person or an object.
- This task indicates two supporting statements to be chained to answer the question.
Each task consists of
- A set of facts or a story: <list of sentences>
- Question based on given story: <single sentence>
- Answer: <single word>
About Supporting Fact
Components of Memory Networks

Input :
- Stories and Questions are the inputs.
- We need a single sentence vector to represent each sentence. So, input stories are converted into word embeddings and then converted into Story sentence vectors.
- Apply the same procedure for the Question and form a Question sentence vector.
Memory :
- It takes the input sentence vectors and stores them into the next available memory slot.
Output :
- It takes the question and loops over all the memories.
- Calculate the score of a given question with each Story Sentence vector and find the best match with the higher score
- Generates the feature vector for the answer. The answer vector is representing the relevant sentence.
- When you have two supporting facts, the process is extended with two blocks. This is where the memory networks start to look sort of recurrent for the first block. We just pass the question vector to determine the answer vector, but for the second block, we pass the answer vector from the first block to determine the final answer vector. These blocks are called memory hops.
Response :
- It takes the answer feature vector and generates the best single word using softmax
How can we find the best match?
How does dot product find the matching?
Sample Q&A application
Table 1: Test Accuracies on Task 1 Single Supporting Fact
Story: Task1 Single Supporting Fact |
Question | Memory Vector |
Sandra traveled to the bathroom | Where is Sandra? | 0.00002 |
Sandra journeyed to the office | Where is Sandra? | 0.00093 |
Mary journeyed to the bedroom | Where is Sandra? | 0.00000 |
John moved to the hallway | Where is Sandra? | 0.00000 |
Sandra went back to the bathroom | Where is Sandra? | 0.99903 |
John went to the bedroom | Where is Sandra? | 0.00002 |
Question: Where is Sandra?
Story2: Task2 Two Supporting Facts | Question | Memory Hop1 | Memory Hop2 |
John moved to the hallway | Where is the apple? | 0.00000 | 0.00000 |
Sandra moved to the kitchen | Where is the apple? | 0.00000 | 0.00000 |
Daniel traveled to the garden | Where is the apple? | 0.00000 | 0.00000 |
Mary went back to the office | Where is the apple? | 0.00000 | 1.00000 |
Mary got the apple there | Where is the apple? | 0.01279 | 0.00000 |
Mary dropped the apple | Where is the apple? | 0.98721 | 0.00000 |
Daniel journeyed to the bedroom | Where is the apple? | 0.00000 | 0.00000 |
Daniel went to the bathroom | Where is the apple? | 0.00000 | 0.00000 |
Question: Where is the apple?
Endnotes
References:
The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.