Microsoft just unveiled an AI-powered drawing bot that can create images from detailed text descriptions of an object.
From reports of Alibaba’s AI-powered robot outperforming humans in a Stanford reading and comprehension test to the launch of Google’s Cloud AutoML, there is no doubt that artificial intelligence is out to change our future. Now, if you think you had enough AI news this week, yesterday Microsoft unveiled its latest AI venture: the drawing bot.
You might think the technology is nothing new since Google has already launched AutoDraw, an AI bot that can analyze doodles and make clip art suggestions as replacements. Well, they might both be artistically inclined, but Microsoft’s new AI is on a whole new level.@Microsoft unveiled its latest #AI tech, the drawing bot!Click To Tweet
Simply called “drawing bot” by Microsoft researchers, this new AI tech can create images based on the text descriptions of an object. In fact, it can generate images of basically anything. From ordinary pastoral scenes like grazing livestock to the most absurd such as a floating double-decker bus. Just describe it, and the machine learning system will do the artwork for you.
The Drawing Bot
In a paper published in arXiv.org, the Microsoft researchers wrote that the drawing bot is programmed to pay close attention to individual words and then generates images based on caption-like text descriptions. The bot’s so-called AI imagination enables it to fill in the details that are missing from the text description, thus producing high-quality images.
“If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch,” Xiaodong He, Microsoft’s Deep Learning Technology Center principal researcher and research manager, said. “These birds may not exist in the real world — they are just an aspect of our computer’s imagination of birds.”
The drawing bot’s underlying technology, known as Generative Adversarial Network (GAN), is reportedly made up of two machine learning models: one for generating images from text descriptions and one for authenticating the generated images called the discriminator.
“The generator attempts to get fake pictures past the discriminator; the discriminator never wants to be fooled. Working together, the discriminator pushes the generator toward perfection,” Microsoft explained.
Furthermore, the bot is said to be trained on datasets that contain paired images and captions, helping the models learn how to match the words to their respective visual representations. For instance, the GAN will create a bird image if the caption says bird while learning what a bird looks at the same time.
“That is a fundamental reason why we believe a machine can learn,” He was quoted as saying.
For now, Microsoft noted that the drawing bot can generate better images from simple text descriptions and that the quality declines as GAN works with more complex descriptions. This is because long, complex sentences are being treated by the network as a single input, losing the detailed information needed by the AI in the process.
While the technology is imperfect, Xiaodong He proudly said that the GAN-produced images are “a nearly three-fold improvement over the previous best-in-class GAN and serve as a milestone on the road toward a generic, human-like intelligence that augments human capabilities.”
“For AI and humans to live in the same world, they have to have a way to interact with each other,” He added. “And language and vision are the two most important modalities for humans and machines to interact with each other.”