Nvidia shows an AI model that can process sounds, producing new sounds

Written by Stephen Nellis

(Reuters) – Nvidia on Monday unveiled a new intelligent model for music and audio production that can change sounds and reproduce sound effects – a technology aimed at producers of music, movies and video games.

Nvidia, the world’s largest supplier of chips and software used to create AI systems, said it has no immediate plans to make public the technology, which it calls Fugatto, short for Foundation Generative Audio Transformer Opus 1.

It joins other technologies presented by startups such as Runway and big players like Meta Platforms that can produce audio or video from a quick article.

Santa Clara, California-based Nvidia’s version creates sounds and music from color rendering, including novel sounds like making a trumpet bark like a dog.

What makes it different from other AI technologies is its ability to take and process existing audio, for example by taking a line played on a piano and turning it into a line sung by a human voice, or taking a voice recording and changing it. the language used and the attitude expressed.

“If we think about audio production in the last 50 years, music sounds different because of the computers, because of the producer,” said Bryan Catanzaro, vice president of applied deep learning at Nvidia. “I think that generative AI will bring new skills to music, to video games and to people in general who want to create things.”

While companies such as OpenAI are negotiating with Hollywood studios about how and how AI can be used in the entertainment industry, the relationship between technology and Hollywood has become strained, especially after Hollywood star Scarlett Johansson accused OpenAI of copying her voice.

Nvidia’s new model was trained on open-source data, and the company said it is still debating whether and how to make it public.

“Any manufacturing technology is always risky, because people can use that to make things that we would like them not to do,” Catanzaro said. “We have to be careful about that, that’s why we don’t have plans to release this.”

The developers of artificial AI models have not yet found a way to prevent the abuse of artificial intelligence such as the user posting false information or infringing copyrights by creating copyrighted characters.

OpenAI and Meta alike have not said when they plan to release to the public their models that produce audio or video.

(Reporting by Stephen Nellis in San Francisco; Editing by Will Dunham)

Leave a Comment Cancel reply