“Machine intelligence is the last invention that humanity will ever need to make “ — Nick Bostrom
As we have already discussed RNNs in my previous post, it’s time we explore LSTM lstm architecture diagram for long memories. Since LSTM’s work takes previous knowledge into consideration it would be good for you also to have a look at my previous article on RNNs ( relatable right ?).
Let’s take an example, suppose I show you one image and after 2 mins I ask you about that image you will probably remember that image content, but if I ask about the same image some days later, the information might be fade or totally lost right? The first condition is where we need RNNs ( for shorter memories ) while the other one is when we need LSTMs for long memory capacities. this clears some doubts right?
For more clarification let’s take another one, suppose you are watching a movie without knowing its name ( e.g. Justice League ) in one frame you See Ban Affleck and think this might be The Batman Movie, in another frame you see Gal Gadot and think this can be Wonder Women right? but when seeing a few next frames you can be sure that this is Justice League because you are using knowledge acquired from past frames, this is exactly what LSTMs architecture diagram do, and by using the following mechanisms:
1. Forgetting Mechanism: Forget all scene related information that is not worth remembering.
2. Saving Mechanism: Save information that is important and can help in the future.
Now that we know when to use LSTMs architecture diagram, let’s discuss the basics of it.
This article was published as a part of the Data Science Blogathon.
LSTMs deal with both Long Term Memory (LTM) and Short Term Memory (STM) and for making the calculations simple and effective it uses the concept of gates.
The above figure shows the simplified architecture of LSTMs. The actual mathematical architecture of LSTM is represented using the following figure:
don’t go haywire with this architecture we will break it down into simpler steps which will make this a piece of cake to grab.
1. Learn Gate: Takes Event ( Et ) and Previous Short Term Memory ( STMt-1 ) as input and keeps only relevant information for prediction.
Source: Udacity
2. The Forget Gate: Takes Previous Long Term Memory ( LTMt-1 ) as input and decides on which information should be kept and which to forget.
3. The Remember Gate: Combine Previous Short Term Memory (STMt-1) and Current Event (Et) to produce output.
Source: Udacity
Combine important information from Previous Long Term Memory and Previous Short Term Memory to create STM for next and cell and produce output for the current event.
Now scroll up to the architecture and put all these calculations so that you will have your LSTM ready.
Training LSTMs removes the problem of Vanishing Gradient ( weights become too small that under-fits the model ), but it still faces the issue of Exploding Gradient ( weights become too large that over-fits the model ). Training of LSTMs can be easily done using Python frameworks like Tensorflow, Pytorch, Theano, etc. and the catch is the same as RNN, we would need GPU for training deeper LSTM Networks.
Since LSTMs take care of the long term dependencies its widely used in tasks like Language Generation, Voice Recognition, Image OCR Models, etc. Also, this technique is getting noticed in Object Detection also ( mainly scene text detection ).
In essence, LSTMs epitomize the pinnacle of machine intelligence, embodying Nick Bostrom’s notion of humanity’s ultimate invention. Their architecture, governed by gates managing memory flow, underscores their capacity for long-term retention and utilization of information. Despite challenges like vanishing gradients, LSTMs find crucial application in tasks such as language generation, voice recognition, and image OCR. Their expanding role in domains like object detection heralds a new era of AI innovation.
The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
Awesome representation. Worth of every second spent on reading it.