ai audiobooks, question 

Can someone explain the difference between the 'ai' used for audio books and text-to-speech readers?

Is Audible just using a buzzword for something that's been available for years or are they actually doing something new?

ai audiobooks, question 

@JessMahler From what I understand, AI voices are using cloned voices using something like Elevenlabs whereas virtualVoice uses traditional TTS

re: ai audiobooks, question 

@zersiax
I'm afraid that doesn't mean anything to me.

re: ai audiobooks, question 

@JessMahler Sorry :) So basically, traditional TS sounds pretty robotic, think google maps's voice. Elevenlabs is this company that lets you "clone"a voice based on recordings of somebody's voice which technically means they sound a lot more "human". Generally still flatter and less inflected as a voice actor, but better than a standard text-to-speech voice. How much better depends on the quality of the voice recordings, the settings for the generation etc.

re: ai audiobooks, question 

@zersiax
ok, why are they calling that AI? It doesn't sound related to generative AI I'm used to hearing about, or the analyzing AI used for medical diagnosis

re: ai audiobooks, question 

@JessMahler Audible isn't doing the AI, the party they're using to create the voices, likely ElevenLabs is. They just feed it the text and the "AI" generates what the chosen voice reading it would sound like? I guess you could say Audible's just taking the credit in that sense

re: ai audiobooks, question 

@zersiax
I guess what I'm asking is what makes this AI? How is AI being defined here and is it the same definition as for, for instance, AI image and text generators that 'create' something new?

Because TTS has been using recordings of real people's voices to make it sound better for years (see what Google has done with Read Aloud on it's eboks), and while the ElevenLabs voices definitely sound a lot better, I don't understand what's different about the process of making it that makes it AI and not just another advance in TTS.

Follow

re: ai audiobooks, question 

@JessMahler @zersiax they’re using AI to generate the voices, using similar training methods to how we’ve trained chatGPT to generate text which lets them create larger corpuses of sounds faster and more cheaply than it does to record a person speaking & break that down into constituent parts. It also theoretically gives improvements in that AI can do more natural pattern matching as to which of the hundred a sounds its generated is appropriate than the extremely complex if then statements a normal TTS would use could (more accurately I think it actually generates the new clips on the fly but for discussions sake we’ll just focus on how this allows a larger set of sounds)

However this is all pretty marginal, it’s most just being touted as revolutionary bc AI makes the hot new thing makes firing all your voice actors okay while previously we judged companies for doing that sort of thing.

re: ai audiobooks, question 

@Satsuma @zersiax
Thanks! That helps alot

Sign in to participate in the conversation
📟🐱 GlitchCat

A small, community‐oriented Mastodon‐compatible Fediverse (GlitchSoc) instance managed as a joint venture between the cat and KIBI families.