AI trained on YouTube and podcasts speaks with ums and ahs

Image of digital waveforms

An AI can generate more natural-sounding synthetic speech by including pauses


Generating speech with different rhythms and pauses makes it sound more human-like, according to an assessment of an artificial intelligence trained on speech taken from YouTube and podcasts.

Most artificial intelligence text-to-speech systems are trained on data sets of acted speech, which can lead to the output sounding stilted and one-dimensional. More natural speech often displays a wide range of rhythms and patterns to convey different meanings and emotions.

Now, Alexander Rudnicky at Carnegie Mellon University in Pittsburgh, Pennsylvania, …

Related Posts