OpenAI Can Clone Voices. That Doesn’t Mean It Should.

The potential for misuse is greater than the benefits.

Parmy Olson

03 Apr 2024, 03:26 PM IST i

Hear that?

(Bloomberg Opinion) -- The latest new tool from OpenAI is so sensitive and controversial that the company has not released it yet. Known as Voice Engine, it’s a system that “generates natural-sounding speech that closely resembles the original speaker,” with just a 15-second sample of their voice.

Such technology is not new. Startups like Eleven Labs and HeyGen can clone voices with a small sample of audio too. But OpenAI has shown it can launch products that are better than pre-existing rivals. Even so, this was an area OpenAI should have steered clear of entirely. The problem isn’t the technology, but OpenAI’s broader insistence on getting AI into the hands of everyone it can.

The company says that it will decide on whether to deploy Voice Engine “at scale” once it’s conducted small-scale tests and assessed the results of “conversations” over how society will adapt. Cloning voices has obvious risks, which OpenAI notes are high in a big election year, but the company’s goal is also to “understand the technical frontier and openly share what is becoming possible with AI.” Remember that OpenAI is also no longer a non-profit organization but a business obliged to maintain its lead in the AI race it kicked off.

Don’t be surprised if OpenAI eventually releases Voice Engine later this year. The company made similarly cautious noises when it did a partial release of GPT-2, a language model that preceded ChatGPT, in February 2019, citing concerns that spammers would exploit it. Nine months later, it released the full model, saying it had “seen no strong evidence of misuse so far.” But its incentives had also changed. In that same time period, OpenAI became a for-profit company and took on a $1 billion investment from Microsoft Corp.

Is OpenAI really being cautious or using caution as a form of PR? The company’s stated mission is “beneficial AI” for humanity, so its blog post about Voice Engine naturally showed examples of its public benefits, including how the tool could provide voices for patients and disabled people who were non-verbal.

While these are noble goals, accessibility has also long been used to give new technology a benevolent veneer. Text-to-speech software was originally marketed as a tool to help the blind, but it went on to power mainstream applications like Siri, Google Assistant, and GPS navigation systems. Elon Musk has touted his Neuralink chip as one that will help those who are paralyzed, but his long-term goal is also to implant it in billions of human brains.

In reality, artificial intelligence threatens to make life more difficult for disabled people. AI tools used to screen job applicants have inadvertently excluded the disabled, while a 2023 ProPublica investigation found that Cigna, the insurance giant, used an algorithm that allowed doctors to sign off on mass denials disproportionately targeting disabled people. Cigna called ProPublica’s reporting “biased and incomplete.”

OpenAI’s suggestions for guardrails for this technology don’t inspire confidence. It suggests creating a “no-go list” to block the creation of voices that sound too similar to “prominent figures.” But the harmful side effects of voice cloning will hit regular people more than celebrities. The vast majority of the deepfake porn that has proliferated over the past year, thanks to advancements in generative AI, hasn’t affected prominent people but regular young women.

Verifying and authenticating original speakers — as OpenAI intends to do — doesn’t always work either. HeyGen, an AI voice-cloning tool that OpenAI is partnering with on Voice Engine, was recently used to clone the voice of a Ukrainian YouTube influencer without her knowledge or consent, she told me. Olga Loiek spotted the HeyGen watermark on one of hundreds of videos using her body and voice on a Chinese social media app. HeyGen said on its site that it required consent from a person to use their voice. “It’s obvious this part wasn’t working,” Loiek said.

It’s also worth noting that several examples OpenAI gave of the benefits of Voice Engine — such as giving a voice to the non-verbal — don’t require a voice. They just need software that can generate one that’s synthetic. Copying human speech opens a new can of worms that is simply not worth the risk. It not only provides a tool to fraudsters, trolls, and others peddling misinformation, it will also likely throw a cleaver into the entertainment business and Hollywood itself, where OpenAI has been courting executives and showing off its video generation tool Sora. Voice-cloning tech threatens actors' livelihoods, as one British actress demonstrated last week when she posted a rejection email saying she’d been replaced by an “AI-generated voice.”

Perhaps OpenAI needs reminding of the old saying that just because you do something doesn’t mean you . The company has found itself swept along by the race it sparked with the release of ChatGPT, and it’s now under pressure to maintain its lead by releasing better versions of rival tools and getting more people to use its AI. That’s why it also recently removed the requirement to log in to ChatGPT.

OpenAI still insists that it is driven by its mission to create AI for humanity, but the potential for harm with voice cloning looks far greater and more widespread than the advantages. The company is doing well to keep itself in the race as a business, but it looks increasingly unclear how humanity will benefit.

OpenAI Can Clone Voices. That Doesn’t Mean It Should.

The potential for misuse is greater than the benefits.

NDTV Profit

NDTV Profit

Follow Us

DOWNLOAD THE APP