Why isn't speech recognition advancing? [close

2019-03-09 04:24发布

What's so difficult about the subject that algorithm designers are having a hard time tackling it?

Is it really that complex?

I'm having a hard time grasping why this topic is so problematic. Can anyone give me an example as to why this is the case?

21条回答
兄弟一词,经得起流年.
2楼-- · 2019-03-09 04:36

This kind of problem is more general than only speech recognition. It exists also in vision processing, natural language processing, artificial intelligence, ...

Speech recognition is affected by the semantic gap problem :

The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation

Between an audio wave form and a textual word, the gap is big,

Between the word and its meaning, it is even bigger...

查看更多
地球回转人心会变
3楼-- · 2019-03-09 04:36

Because Lernout&Hauspie went bust :)

(sorry, as a Belgian I couldn't resist)

查看更多
小情绪 Triste *
4楼-- · 2019-03-09 04:37

You said it yourself, algorithm designers are working on it... but language and speech are not an algorithmic constructs. They are the peak of the development of the highly complex human system involving concepts, meta-concepts, syntax, exceptions, grammar, tonality, emotions, neuronal as well as hormon activity, etc. etc.

Language needs a highly heuristic approach and that's why progress is slow and prospects maybe not too optimistic.

查看更多
冷血范
5楼-- · 2019-03-09 04:39

I don't agree with the assumption in the question - I have recently been introduced to Microsoft's speech recognition and am impressed. It can learn my voice after a few minutes and usually identifies common words correctly. It also allows new words to be added. It is certainly usable for my purposes (understanding chemistry).

Differentiate between recognising the (word) tokens and understanding the meaning of them.

I don't yet know about other languages or operating systems.

查看更多
你好瞎i
6楼-- · 2019-03-09 04:42

I remember reading that Microsoft had a team working on speech recognition, and they called themselves the "Wreck a Nice Beach" team (a name given to them by their own software).

To actually turn speech into words, it's not as simple as mapping discrete sounds, there has to be an understanding of the context as well. The software would need to have a lifetime of human experience encoded in it.

查看更多
在下西门庆
7楼-- · 2019-03-09 04:42

I once asked a similar question to my instructor; i asked him something like what challenge is there in making a speech-to-text converter. Among the answers he gave, he asked me to pronounce 'p' and 'b'. Then he said that they differ for a very small time in the beginning, and then they sound similar. My point is that it is even hard to recognize what sound is made, recognizing voice would be even harder. Also, note that once you record people's voices, it is just numbers that you store. Imagine trying to find metrics like accent, frequency, and other parameters useful for identifying voice from nothing but input such as matrices of numbers. Computers are good at numerical processing etc, but voice is not really 'numbers'. You need to encode voice in numbers and then do all computation on them.

查看更多
登录 后发表回答