Android O new TextToSpeech onRangeStart() callback

2019-07-16 13:59发布

The new callback function onRangeStart() of TTS UtteranceProgressListener would let us e.g. highlight individual words of a longer phrase, as it's spoken by the TTS engine. The callback is defined in Android API reference at https://developer.android.com/reference/android/speech/tts/UtteranceProgressListener.html#onRangeStart(java.lang.String, int, int, int), but I cannot find anywhere the information on how to actually define the ranges in the phrase, when an app sends the phrase (e.g. a sentence) to the TTS engine for speech generation.

What are these "ranges" exactly and how they can be defined? Or are they pre-defined as "words" or anything separated with white space?

More info: I used the Android O emulator and created onRangeStart() callback in my TTS app, used different voices from Google TTS set to see if any ranges were defined automatically and the callback will be called. Nothing. Maybe the ranges must be defined somehow in the "params" Bundle of the speak() call???

1条回答
欢心
2楼-- · 2019-07-16 14:26

Google is still not documenting this feature and the latest response to the issue filed in their tracker is "We've deferred this to a future release, but leaving this open for now."

Meantime by implementing onRangeStart() callback in my TTS app and make it show debug output, I see that that the "ranges" are simply words. I see this callback hit when using English voices from Google TTS only, no other TTS voices from Google or other companies that I tried so far, implement this yet. For example, reading aloud a sentence: "This is a sentence to read aloud." produces the following output in onRangeStart():

onRangeStart(avar-1) start=0, end=4, frame=275         (This)
onRangeStart(avar-1) start=5, end=7, frame=3575        (is)
onRangeStart(avar-1) start=8, end=9, frame=6270        (a)
onRangeStart(avar-1) start=10, end=18, frame=7810      (sentence)
onRangeStart(avar-1) start=19, end=21, frame=18535     (to)
onRangeStart(avar-1) start=22, end=26, frame=21285     (read)
onRangeStart(avar-1) start=27, end=32, frame=25795     (aloud)

It would be helpful if someone from Google told us officially that "ranges" are simply words, at least for now.

查看更多
登录 后发表回答