Nvidia presented a tool at the Interspeech 2021 conference with which AI voices can learn the natural pronunciation of words. With the RAD-TTS tool, researchers can use the recording of their own voice to train a speech algorithm.
At the GPU Technology Conference in 2017, Nvidia researchers demonstrated the progress they have made in developing artificial intelligence. They also released a synthetic voice at the time, but weren’t entirely satisfied with the performance.
In 2020, a new AI voice was introduced: flutron. This artificial voice sounded more natural and human, but the researchers weren’t done yet. The next step, according to the researchers, was to modify the algorithm when errors occurred during pronunciation, in the same way as with humans: by imitation.
Researchers have developed an AI model for this purpose called RAD-TTS, which a ai- text to speech-Teach an algorithm how to pronounce a word or group of words. They do this by loading their audio recordings onto the algorithm, and converting them into parameters that can then be imitated by the algorithm.
With RAD-TTS, the pitch of the recorded voice can also be changed dramatically. This enabled one researcher to transform his male voice into a synthetic female voice. This voice was used as a voiceover in the promotional video. Some new technology is open source according to Nvidia and will be available on Nvidia NeMo . Toolkit.