Friday, July 19, 2024

OpenAI Previews ‘Voice Engine’ Audio Device That Can Clone Human Voices With 15 Seconds of Audio

OpenAI Previews ‘Voice Engine’ Audio Device That Can Clone Human Voices With 15 Seconds of Audio

OpenAI is sharing early outcomes from a take a look at for a function that may learn phrases aloud in a convincing human voice — highlighting a brand new frontier for synthetic intelligence and elevating the specter of deepfake dangers. The corporate is sharing early demos and use circumstances from a small-scale preview of the text-to-speech mannequin, referred to as Voice Engine, which it has shared with about 10 builders to this point, a spokesperson mentioned. OpenAI determined towards a wider rollout of the function, which it briefed reporters on earlier this month.

A spokesperson for OpenAI mentioned the corporate determined to reduce the discharge after receiving suggestions from stakeholders similar to policymakers, trade specialists, educators and creatives. The corporate had initially deliberate to launch the device to as many as 100 builders by means of an utility course of, in line with the sooner press briefing.

“We acknowledge that producing speech that resembles folks’s voices has severe dangers, that are particularly prime of thoughts in an election 12 months,” the corporate wrote in a weblog publish Friday. “We’re partaking with US and worldwide companions from throughout authorities, media, leisure, schooling, civil society and past to make sure we’re incorporating their suggestions as we construct.”

Different AI know-how has already been used to pretend voices in some contexts. In January, a bogus however realistic-sounding telephone name purporting to be from President Joe Biden inspired folks in New Hampshire to not vote within the primaries — an occasion that stoked AI fears forward of vital world elections.

Not like OpenAI’s earlier efforts at producing audio content material, Voice Engine can create speech that seems like particular person folks, full with their particular cadence and intonations. All of the software program wants is 15 seconds of recorded audio of an individual talking to recreate their voice.

Throughout an indication of the device, Bloomberg listened to a clip of OpenAI Chief Government Officer Sam Altman briefly explaining the know-how in a voice that sounded indistinguishable from his precise speech, however was fully AI-generated.

“When you’ve got the correct audio setup, it is principally a human-caliber voice,” mentioned Jeff Harris, a product lead at OpenAI. “It is a fairly spectacular technical high quality.” Nonetheless, Harris mentioned, “There’s clearly a whole lot of security delicacy across the capacity to actually precisely mimic human speech.”

One among OpenAI’s present developer companions utilizing the device, the Norman Prince Neurosciences Institute on the not-for-profit well being system Lifespan, is utilizing know-how to assist sufferers get better their voice. For instance, the device was used to revive the voice of a younger affected person who misplaced her capacity to talk clearly attributable to a mind tumor by replicating her speech from an earlier recording for a college mission, the corporate weblog publish mentioned.

OpenAI’s customized speech mannequin may translate the audio it generates into totally different languages. That makes it helpful for firms within the audio enterprise, like Spotify Know-how SA. Spotify has already used the know-how in its personal pilot program to translate the podcasts of well-liked hosts like Lex Fridman. OpenAI additionally touted different useful purposes of the know-how, similar to making a wider vary of voices for academic content material for youngsters.

Within the testing program, OpenAI is requiring its companions to conform to its utilization insurance policies, get hold of consent from the unique speaker earlier than utilizing their voice, and to confide in listeners that the voices they’re listening to are AI-generated. The corporate can also be putting in an inaudible audio watermark to permit it to differentiate whether or not a bit of audio was created by its device.

Earlier than deciding whether or not to launch the function extra broadly, OpenAI mentioned it is soliciting suggestions from outdoors specialists. “It is vital that individuals around the globe perceive the place this know-how is headed, whether or not we finally deploy it broadly ourselves or not,” the corporate mentioned within the weblog publish.

OpenAI additionally wrote that it hopes the preview of its software program “motivates the necessity to bolster societal resilience” towards the challenges caused by extra superior AI applied sciences. For instance, the corporate referred to as on banks to section out voice authentication as a safety measure for accessing financial institution accounts and delicate data. It is also looking for public schooling about misleading AI content material and extra growth of strategies for detecting whether or not audio content material is actual or AI-generated.

© 2024 Bloomberg L.P.

(This story has not been edited by NDTV employees and is auto-generated from a syndicated feed.)

Affiliate hyperlinks could also be mechanically generated – see our ethics assertion for particulars.

Related Articles


Please enter your comment!
Please enter your name here

Latest Articles