OpenAI unveils ‘Voice Engine’: Mimics human speech with just 15 second audio samples

[ad_1]

OpenAI, renowned for its innovative strides in AI technology with creations like Sora, its video generator, has now introduced ‘Voice Engine,’ a pioneering voice cloning tool. This remarkable audio model can accurately replicate the nuances of human speech, including intonation and unique speech patterns, utilising just a brief 15-second sample of the original voice. Despite eager anticipation, OpenAI has opted to keep this new feature tightly under wraps, citing concerns over potential misuse and the proliferation of fake content online.

Remarkable Efficiency and Precision

“Incredibly, our Voice Engine can craft emotive and lifelike voices using just a single 15-second sample,” the company stated in a recent blog post.

Also read: Microsoft and OpenAI to launch $100 billion AI data center project with ‘Stargate’ supercomputer

OpenAI’s Voice Engine Versus Industry Standards

In contrast, existing AI voice platforms like ElevenLabs typically require longer samples, with their instant voice cloning tool necessitating at least one minute of audio for operation. For optimal results, approximately 10 minutes of continuous speech are recommended, particularly for professional-grade services.

OpenAI showcased the capabilities of Voice Engine through various demonstrations, including a poignant example where the voice of a young patient, who had lost much of her speaking ability due to a brain tumour, was replicated using an older recording from a school project. The technology enabled her to communicate using her own voice, a feat made possible through collaboration with Lifespan, a nonprofit associated with Brown University’s medical school.

Also read: iOS 18 at WWDC 2024: Features, AI upgrades, launch date, supported devices and more

Moreover, OpenAI revealed partnerships with organisations like HeyGen, demonstrating how Voice Engine facilitates natural-sounding translations of speech from one language to another.

Also read: Apple may soon offer ‘topographic maps’ on iPhone, Macbook: What is it and all details

According to OpenAI, Voice Engine was initially developed in late 2022 and is already integrated into the preset voices available in OpenAI’s text-to-speech API, as well as ChatGPT’s Voice and Read Aloud feature. With these latest advancements, the company is proceeding with caution before a wider release.

[ad_2]

Source link

Remarkable Efficiency and Precision

OpenAI’s Voice Engine Versus Industry Standards

Leave a Reply Cancel reply