Artificial Intelligence (AI) researchers are fine-tuning audio technology that can transform your voice to sound like a celebrity, the opposite sex, or someone much younger or older than you. The only problem: increasing numbers of criminals, hatemongers, propagandists, and the like see the technology as a great way to commit crimes, spread misinformation, promote hate speech, and more.
"It is not an exaggeration to say that we are at the cusp of a looming crisis of AI-generated fake media," says Siwei Lyu, a professor specializing in digital media forensics, computer vision and machine learning at the State University of New York (SUNY) at Albany.
In the long term, voice conversion researchers see myriad commercial applications for their tech.
Software like Google's Parrotron, for example, could be commercialized to offer voice conversions of phrases spoken with an Australian accent into phrases spoken in American English or other accents.
Music companies will be tempted to use AI voice conversion to issue new music from entertainment icons who either longer perform or are no longer with us. Yamaha, for example, used voice conversion technology to issue a new song by deceased Japanese rock star Hideto Matsumoto in 2014, six ears after his death, according to Junichi Yamagishi, a professor specializing in digital content and media science at the National Institute of Informatics, Tokyo, who is a globally recognized expert in voice conversion.
Google research scientist Fadi Biadsy, and Google Brain software engineer Ron Weiss, also see great promise for voice conversion for the speaking impaired. Google Parrotron, for example, has already been trained to transform the hard-to-understand speech of people with hearing or muscular afflictions into broadcast-quality speech.
However, Terah Lyons, founding executive director of The Partnership on AI, says many of her member organizations worry that voice conversion technology and similar tech could also be used nefariously. Specifically, Lyons wrote in a Partnership on AI blog post that the ability to create synthetic or manipulated content that is difficult to discern from real events frames the urgent need for developing new capabilities for detecting such content, and for authenticating trusted media and news sources.
Fake audio, for example, offers criminals and others the ability to fool voice recognition systems and break into the computers and physical spaces they guard. Mimicked voices can also be used to spread false news stories using sound bites mimicking celebrities, prominent government officials, and the like. Plus, fake voices can be used to impersonate top company officers.
Fake audio's 'sister' technology, fake video, simply ups the ante, offering criminals, propagandists, and others the added credibility that video brings to a sound bite.
Nitesh Saxena, a computer science professor and research director at University of Alabama at Birmingham, says deepfake audio and video puts certain populations at extreme risk. "More work is needed to understand the susceptibility of more vulnerable populations such as elderly or those with mental or medical conditions," Saxena says.
Widespread proliferation of fake audio and video also offers purveyors of falsehoods greater opportunity to characterize authentic audio and video as inauthentic, according to Tomi H. Kinnunen, associate professor and researcher specializing in computerized speech at the University of Eastern Finland.
After all, if the world's news media is overrun by fake audio and fake video, who will really be able to easily say what is real and what is fake?
The threat is seen as so substantial, The Partnership on AI—whose founding members include technology heavyweights like Apple, Amazon, Facebook, Intel, and IBM—has coalesced behind an initiative to fight the scourge, according to Lyons. Members of the initiative's steering committee include Laura Ellis, head of technology forecasting at the BBC; Irina Kofman, director and business lead at Facebook AI, and Jay Stokes, research software engineer at Microsoft.
"We are in the initial shocking time right now, just as when we first heard about computer virus, or network hacking," said SUNY Albany's Lyu. In the long term, Lyu is hopeful fake news will be thwarted by a combination of detection tools and an unwavering vigilance by those seeking to quash it.
Added Supasorn Suwajanakorn, a faculty member specializing in computer vision research at the Vidyasirimedhi Institute of Science and Technology in Thailand, agrees. "I don't think there will be a 100% foolproof detection tool. And I think this is a race we can never win or lose."
However, Suwajanakorn adds, "I do think it's very important to keep playing the catch-up game, because that's how we keep deepfakes less-appealing as a political tool, or for personal gain."
Serge Belongie, a computer science professor at Cornell University, said, "I think it is unlikely computer scientists will develop a silver bullet for this problem, but it is likely that we can develop a battery of defenses that can keep us one step ahead in the cat and mouse game.
"One sign of success is if the deepfake hackers retreat to low-resolution, low-frame-rate formats for their videos, for fear that their tampering will be more detectable in higher-fidelity formats."
Part of the solution likely will include pinpointing bad actors and relentlessly exposing them whereever they surface, according to Shrikanth S. Narayanan, a professor of electrical and computer engineering at the University of Southern California.
Glass-half-full optimists like Suwajanakorn hope the 'good guys' will prevail over deepfakes. "I believe our society has a mechanism for dealing with potential threats, and when the time comes, enough people will work toward to the countermeasures: public awareness campaigns, development of technical tools from researchers, discussions from policy makers."
Lyu agreed, emphasizing the need for all facets of society to coalesce around the problem. "Fighting AI-generated fake media is not merely a technical problem. It requires a joint force from the government agencies, platform companies, media outlets, law enforcement, and the ordinary users."
Joe Dysart is an Internet speaker and business consultant based in Manhattan, NY, USA.