Microsoft’s New AI Bot can Draw Images Based on Captions

Pranav Dar 20 Jan, 2018 • 2 min read

Microsoft has built an AI powered bot that can draw images based on the text it is provided. The below image, published by Microsoft, depicts a yellow black bird that was completely generated by the bot.

Source: Microsoft

Microsoft is simply calling this new technology the “drawing bot” for now. It can generate images from animals to scenic hillsides, and even outlandish things like flying cars and twisted street lamps. It’s basically the AI version of pictionary where you’re supposed to draw something based on cue cards. The only difference is you type something for the bot, and it will run it’s algorithm and give you the image.

The most exciting part about the technology is that the images geenrated might not even be of actual real things. The bird created in the above image? It might not even be in existence – they’re just a rendering of the machine’s imagination of how a bird looks like. Further, each image that is created contains other details that are not provided in the text descriptions.

In terms of where this bot will be used once it’s made available, Microsoft see it being used by painters and interior decorators. It can also be used a voice-activated tool for creating or refining photos (maybe there’s a role for Cortana in there).

To make the AI understand what words go with which pictures, the drawing bot was trained on pairs of images and captions. The algorithm is divided into two parts:

  • GAN – Generative Adversarial Network, it generates images based on the text
  • Discriminator – this judges the quality of the generated image

Microsoft has previously released the CaptionBot, which takes images as input and writes captions for them. They followed this up with the SeeingAI tool. Again, it takes images as input and describes what’s in them. This is especially targeted towards low-vision and blind people.

Our take on this

While Google launched a similar AI last year which could create doodles, Microsoft’s version is in a different league altogether. It’s not perfect yet, but one can imagine the future uses for such technology. The principal researcher in this matter, Xiaodong He, thinks it might even be used to create animated movies (using pre-written scripts). Following Google’s AutoML Vision launch yesterday, 2018 is already promising to be a big year in the image recognition field.

Pranav Dar 20 Jan 2018

Senior Editor at Analytics Vidhya. Data visualization practitioner who loves reading and delving deeper into the data science and machine learning arts. Always looking for new ways to improve processes using ML and AI.

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear