Facebook engineers used AI to clone Bill Gates' voice—here's what it sounds like


It's not hard to find a clip of Bill Gates talking about anything from the early days of Microsoft to his friendship with fellow billionaire Warren Buffett. As one of the most famous and wealthiest people alive, he has been a regular presence in television, radio and other interview formats for several decades.

But now you can even listen to the sound of Gates' voice saying things that the billionaire has never actually said. That's because two Facebook engineers have created an artificial intelligence system that clones the voice of famous people.

Sean Vasquez and Mike Lewis used AI to develop a sophisticated computer-generated speech system, called MelNet, according to their recently published research paper. The system, which relies on machine learning, has already generated convincing AI-generated audio clips that match the voice of Bill Gates and a handful of other famous speakers, from primatologist Jane Goodall to the late theoretical physicist Stephen Hawking.

MelNet's audio clips of those famous individuals' AI-cloned voices can be found online here, under the heading "Selected Speakers." But here are a few samples of the AI-generated version of Gates' voice speaking in a series of two- to three-second clips that feature simple test phrases, like "A cramp is no small danger on a swim." Or "The glow deepened in the eyes of the sweet girl." Though the sentences seem nonsensical, they are actually taken from a collection called "Harvard Sentences," which are phonetically balanced phrases often used by engineers to test communications systems and artificial voice programs.

Even though the phrases above were generated by computer engineers, they sound remarkably like the Microsoft cofounder's actual voice. For a comparison, here's Gates talking about the possibility of increased regulations in the tech industry on CNBC.

One of the reasons the Facebook engineers working on MelNet picked Gates as a test subject was because the project used hundreds of hours of TED talks as a starting point to train the AI system, including some of Gates' own TED talks. Essentially, programs like MelNet generate voice clones by feeding large amounts of audio data into AI algorithms that analyze human voices to the point that they can be convincingly mimicked.

MelNet is far from the first example of AI-generated audio or video to make waves. Last month, a team at AI startup Dessa did manage to create a fairly realistic AI-generated audio clip that clones popular podcast host Joe Rogan's voice speaking a series of short sentences for nearly two minutes of speech.

Of course, these AI-generated audio clips aren't perfect clones of the voices of Gates or other test subjects. As the MIT Technology Review points out, trying to mimic a person's exact voice over longer periods of time proves more difficult, as machine learning programs still struggle to replicate shifts in speech patterns and tones over the course of a long story or speech.

There are plenty of potential applications for this type of technology, including improving the technology used by AI-powered voice assistants. However, the technology for AI-generated audio and video has also led to growing concern that it can be used maliciously. For instance, the rise of "deepfakes" and other artificially altered videos and audio — like recently altered videos of politicians Nancy Pelosi and President Donald Trump — have caused alarm, as people wonder if advancing technology will eventually make it impossible to trust our eyes and ears.

