I am working on a project which I need to create a custom voice engine for my application. I have seen something like the TTS Builder, but is there someone who understands how applications such as the TTS Builder itself is developed? What is the thing behind SAPI engines? How do they work? How can one construct his/her own? Can I develop my own algorithm? I would prefer to do this in C# if possible
问题:
回答1:
From what I see, it looks like TTS Builder takes existing voices and allows you to tweak minor parameters to make a slightly different-sounding voice. But creating a voice with a different accent or pronunciation I think is more complex.
From AT&T Research:
Creating high-quality voices requires a good voice talent, a sound-proof room, professional audio equipment, hours of written material with thorough coverage of phoneme combinations in the language, and the time and expertise to turn those recordings into a decent synthetic voice. Because of the expense involved, custom voice builds are usually done for corporations that want to computerize an existing actor's voice, for example to continue a brand image.
...
It may take far less material to build a tranformation model than it does to build a TTS voice from scratch.