Artificial Intelligence Is Killing the Uncanny Valley and Our Grasp on Reality

AI-generated video, photos, and audio that mimic the real world are already here. Now we get to live among them.
Artificial Intelligence Is Killing the Uncanny Valley and Our Grasp on Reality
Laurent Hrybyk

There’s a revolution afoot, and you will know it by the stripes.

Earlier this year, a group of Berkeley researchers released a pair of videos. In one, a horse trots behind a chain link fence. In the second video, the horse is suddenly sporting a zebra’s black-and-white pattern. The execution isn’t flawless, but the stripes fit the horse so neatly that it throws the equine family tree into chaos.

Turning a horse into a zebra is a nice stunt, but that’s not all it is. It is also a sign of the growing power of machine learning algorithms to rewrite reality. Other tinkerers, for example, have used the zebrafication tool to turn shots of black bears into believable photos of pandas, apples into oranges, and cats into dogs. A Redditor used a different machine learning algorithm to edit porn videos to feature the faces of celebrities. At a new startup called Lyrebird, machine learning experts are synthesizing convincing audio from one-minute samples of a person’s voice. And the engineers developing Adobe’s artificial intelligence platform, called Sensei, are infusing machine learning into a variety of groundbreaking video, photo, and audio editing tools. These projects are wildly different in origin and intent, yet they have one thing in common: They are producing artificial scenes and sounds that look stunningly close to actual footage of the physical world. Unlike earlier experiments with AI-generated media, these look and sound real.

The technologies underlying this shift will soon push us into new creative realms, amplifying the capabilities of today’s artists and elevating amateurs to the level of seasoned pros. We will search for new definitions of creativity that extend the umbrella to the output of machines. But this boom will have a dark side, too. Some AI-generated content will be used to deceive, kicking off fears of an avalanche of algorithmic fake news. Old debates about whether an image was doctored will give way to new ones about the pedigree of all kinds of content, including text. You’ll find yourself wondering, if you haven’t yet: What role did humans play, if any, in the creation of that album/TV series/clickbait article?

A world awash in AI-generated content is a classic case of a utopia that is also a dystopia. It’s messy, it’s beautiful, and it’s already here.

Currently there are two ways to produce audio or video that resembles the real world. The first is to use cameras and microphones to record a moment in time, such as the original Moon landing. The second is to leverage human talent, often at great expense, to commission a facsimile. So if the Moon descent had been a hoax, a skilled film team would have had to carefully stage Neil Armstrong’s lunar gambol. Machine learning algorithms now offer a third option, by letting anyone with a modicum of technical knowledge algorithmically remix existing content to generate new material.

At first, deep-learning-generated content wasn’t geared toward photorealism. Google’s Deep Dreams, released in 2015, was an early example of using deep learning to crank out psychedelic landscapes and many-eyed grotesques. In 2016, a popular photo editing app called Prisma used deep learning to power artistic photo filters, for example turning snapshots into an homage to Mondrian or Munch. The technique underlying Prisma is known as style transfer: take the style of one image (such as The Scream) and apply it to a second shot.

Now the algorithms powering style transfer are gaining precision, signalling the end of the Uncanny Valley—the sense of unease that realistic computer-generated humans typically elicit. In contrast to the previous somewhat crude effects, tricks like zebrafication are starting to fill in the Valley’s lower basin. Consider the work from Kavita Bala’s lab at Cornell, where deep learning can infuse one photo’s style, such as a twinkly nighttime ambience, into a snapshot of a drab metropolis—and fool human reviewers into thinking the composite place is real. Inspired by the potential of artificial intelligence to discern aesthetic qualities, Bala cofounded a company called Grokstyle around this idea. Say you admired the throw pillows on a friend’s couch or a magazine spread caught your eye. Feed Grokstyle’s algorithm an image, and it will surface similar objects with that look.

“What I like about these technologies is they are democratizing design and style,” Bala says. “I’m a technologist—I appreciate beauty and style but can’t produce it worth a damn. So this work makes it available to me. And there’s a joy in making it available to others, so people can play with beauty. Just because we are not gifted on this certain axis doesn’t mean we have to live in a dreary land.”

At Adobe, machine learning has been a part of the company’s creative products for well over a decade, but only recently has AI become transformative. In October engineers working on Sensei, the company’s set of AI technologies, showed off a prospective video editing tool called Adobe Cloak, which allows its user to seamlessly remove, say, a lamppost from a video clip—a task that would ordinarily be excruciating for an experienced human editor. Another experiment, called Project Puppetron, applies an artistic style to a video in real time. For example, it can take a live feed of a person and render him as a chatty bronze statue or a hand-drawn cartoon. “People can basically do a performance in front of a web cam or any camera and turn that into animation, in real time,” says Jon Brandt, senior principal scientist and director of Adobe Research. (Sensei’s experiments don’t always turn into commercial products.)

Machine learning makes these projects possible because it can understand the parts of a face or the difference between foreground and background better than previous approaches in computer vision. Sensei’s tools let artists work with concepts, rather than the raw material. “Photoshop is great at manipulating pixels, but what people are trying to do is manipulate the content that is represented by the pixels,” Brandt explains.

That’s a good thing. When artists no longer waste their time wrangling individual dots on a screen, their productivity increases, and perhaps also their ingenuity, says Brandt. “I am excited about the possibility of new art forms emerging, which I expect will be coming.”

But it’s not hard to see how this creative explosion could all go very wrong. For Yuanshun Yao, a University of Chicago graduate student, it was a fake video that set him on his recent project probing some of the dangers of machine learning. He had hit play on a recent clip of an AI-generated, very real-looking Barack Obama giving a speech, and got to thinking: Could he do a similar thing with text?

A text composition needs to be nearly perfect to deceive most readers, so he started with a forgiving target, fake online reviews for platforms like Yelp or Amazon. A review can be just a few sentences long, and readers don’t expect high-quality writing. So he and his colleagues designed a neural network that spat out Yelp-style blurbs of about five sentences each. Out came a bank of reviews that declared such things as, “Our favorite spot for sure!” and “I went with my brother and we had the vegetarian pasta and it was delicious.” He asked humans to then guess whether they were real or fake, and sure enough, the humans were often fooled.

With fake reviews costing around $10 to $50 each from micro-task marketplaces, Yao figured it was just a matter of time before a motivated engineer tried to automate the process, driving down the price and kicking off a plague of false reviews. (He also explored using neural nets to defend a platform against fake content, with some success.) “As far as we know there are not any such systems, yet,” Yao says. “But maybe in five or ten years, we will be surrounded by AI-generated stuff.” His next target? Generating convincing news articles.

Progress on videos may move faster. Hany Farid, an expert at detecting fake photos and videos and a professor at Dartmouth, worries about how fast viral content spreads, and how slow the verification process is. Farid imagines a near future in which a convincing fake video of President Trump ordering the total nuclear annihilation of North Korea goes viral and incites panic, like a recast War of the Worlds for the AI era. “I try not to make hysterical predictions, but I don’t think this is far-fetched,” he says. “This is in the realm of what’s possible today.”

Fake Trump speeches are already circulating on the internet, a product of Lyrebird, the voice synthesis startup—though in the audio clips the company has shared with the public, Trump keeps his finger off the button, limiting himself to praising Lyrebird. Jose Sotelo, the company’s cofounder and CEO, argues that the technology is inevitable, so he and his colleagues might as well be the ones to do it, with ethical guidelines in place. He believes that the best defense, for now, is raising awareness of what machine learning is capable of. “If you were to see a picture of me on the moon, you would think it’s probably some image editing software,” Sotelo says. “But if you hear convincing audio of your best friend saying bad things about you, you might get worried. It’s a really new technology and a really challenging problem.”

Likely nothing can stop the coming wave of AI-generated content—if we even wanted to. At its worst, scammers and political operatives will deploy machine learning algorithms to generate untold volumes of misinformation. Because social networks selectively transmit the most attention-grabbing content, these systems’ output will evolve to be maximally likeable, clickable, and shareable.

But at its best, AI-generated content is likely to heal our social fabric in as many ways as it may rend it. Sotelo of Lyrebird dreams of how his company’s technology could restore speech to people who have lost their voice to diseases such as ALS or cancer. That horse-to-zebra video out of Berkeley? It was a side effect of work to improve how we train self-driving cars. Often, driving software is trained in virtual environments first, but a world like Grand Theft Auto only roughly resembles reality. The zebrafication algorithm was designed to shrink the distance between the virtual environment and the real world, ultimately making self-driving cars safer.

These are the two edges of the AI sword. As it improves, it mimics human actions more and more closely. Eventually, it has no choice but to become all too human: capable of good and evil in equal measure.