Pranav Dar — Published On February 24, 2018


  • The algorithm can predict the next few frames based on the previous frame
  • It is based on a series of neural networks
  • Combines text to image and image detection algorithms
  • Still at a nascent stage, has a lot of improvement areas



Watch out Hollywood. Machine Learning might be extending it’s arms to take over the movie making business.

A new algorithm developed by researchers can now create a movie from just a few lines of text. Prior to this, images could be created from text and technologies like FAIR’s Detectron could identify objects in images. But putting that all together to make a movie – that hasn’t been done before.

The underlying idea behind the algorithm was devised using machine learning. It’s basically a series of neural networks that operates in two stages:

  • The first stage is used to create an idea, or outline, of the video. This means a blurred image of the background and another blurred image of the main action in the frame
  • The second stage combines the two – it takes the blurred background and main action and creates a video out of it

So how does the neural network learn? Basically, it sees the video generated to illustrate the action (for instance, a player scoring a goal) alongside the actual video of the player scoring a goal. Then, it is trained to pick the real video. The more data it is fed, the better it becomes in terms of prediction accuracy. The feedback it gives keeps setting a higher standard for the generator network.

The developers trained this algorithm on ten types of scenes, including “kitesurfing on the sea” and “playing golf on grass”. The results were not that great – it resulted in a grainy video. However, according to a post by Science, “a simple classification algorithm correctly guessed the intended action among six choices about half the time”.

As of today, the videos are 32 frames long. Their size is roughly that of a stamp. When the researchers tried to increase the size, it reduced the prediction accuracy drastically.

You can read the research paper on how video is generated from text here.


Our take on this

Even though these are very early days for this algorithm, the applications could well be beyond movie making. It could potentially generate training data to help train other neural networks.

For example, it could help the autonomous cars train on fatal or dangerous situations it hasn’t seen before. It could train robots menial tasks, help in healthcare research, among other varying things.

Given how quickly research seems to be progressing in deep learning these days, it should not be long before the algorithm improves to a much more efficient rate with far ranging applications.


Subscribe to AVBytes here to get regular data science, machine learning and AI updates in your inbox!


About the Author

Pranav Dar
Pranav Dar

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Our Top Authors

Download Analytics Vidhya App for the Latest blog/Article

Leave a Reply Your email address will not be published. Required fields are marked *