A Neural Network Algorithm can now Make Movies from a few lines of Text
- The algorithm can predict the next few frames based on the previous frame
- It is based on a series of neural networks
- Combines text to image and image detection algorithms
- Still at a nascent stage, has a lot of improvement areas
Watch out Hollywood. Machine Learning might be extending it’s arms to take over the movie making business.
A new algorithm developed by researchers can now create a movie from just a few lines of text. Prior to this, images could be created from text and technologies like FAIR’s Detectron could identify objects in images. But putting that all together to make a movie – that hasn’t been done before.
The underlying idea behind the algorithm was devised using machine learning. It’s basically a series of neural networks that operates in two stages:
- The first stage is used to create an idea, or outline, of the video. This means a blurred image of the background and another blurred image of the main action in the frame
- The second stage combines the two – it takes the blurred background and main action and creates a video out of it
So how does the neural network learn? Basically, it sees the video generated to illustrate the action (for instance, a player scoring a goal) alongside the actual video of the player scoring a goal. Then, it is trained to pick the real video. The more data it is fed, the better it becomes in terms of prediction accuracy. The feedback it gives keeps setting a higher standard for the generator network.
The developers trained this algorithm on ten types of scenes, including “kitesurfing on the sea” and “playing golf on grass”. The results were not that great – it resulted in a grainy video. However, according to a post by Science, “a simple classification algorithm correctly guessed the intended action among six choices about half the time”.
As of today, the videos are 32 frames long. Their size is roughly that of a stamp. When the researchers tried to increase the size, it reduced the prediction accuracy drastically.
You can read the research paper on how video is generated from text here.
Our take on this
Even though these are very early days for this algorithm, the applications could well be beyond movie making. It could potentially generate training data to help train other neural networks.
For example, it could help the autonomous cars train on fatal or dangerous situations it hasn’t seen before. It could train robots menial tasks, help in healthcare research, among other varying things.
Given how quickly research seems to be progressing in deep learning these days, it should not be long before the algorithm improves to a much more efficient rate with far ranging applications.
Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!