A StringToken Parser which gives Google Search sty-第2页回答

Seeking a method to:

Take whitespace separated tokens in a String; return a suggested Word

ie:
Google Search can take "fonetic wrd nterpreterr",
and atop of the result page it shows "Did you mean: phonetic word interpreter"

A solution in any of the C* languages or Java would be preferred.

Are there any existing Open Libraries which perform such functionality?

Or is there a way to Utilise a Google API to request a suggested word?

标签： language-agnostic parsing nlp

8条回答

劫难

2楼-- · 2019-02-02 02:09

Since no one has yet mentioned it, I'll give one more phrase to search for: "edit distance" (for example, link text). That can be used to find closest matches, assuming it's typos where letters are transposed, missing or added.

But usually this is also coupled with some sort of relevancy information; either by simple popularity (to assume most commonly used close-enough match is most likely correct word), or by contextual likelihood (words that follow preceding correct word, or come before one). This gets into information retrieval; one way to start is to look at bigram and trigrams (sequences of words seen together). Google has very extensive freely available data sets for these.

For simple initial solution though a dictionary couple with Levenshtein-based matchers works surprisingly well.

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-02-02 02:20

In his article How to Write a Spelling Corrector, Peter Norvig discusses how a Google-like spellchecker could be implemented. The article contains a 20-line implementation in Python, as well as links to several reimplementations in C, C++, C# and Java. Here is an excerpt:

The full details of an industrial-strength spell corrector like Google's would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

Using Norvig's code and this text as training set, i get the following results:

>>> import spellch
>>> [spellch.correct(w) for w in 'fonetic wrd nterpreterr'.split()]
['phonetic', 'word', 'interpreters']

0人赞添加讨论(0) 举报

上一页 1 2

A StringToken Parser which gives Google Search sty

Seeking a method to:

Take whitespace separated tokens in a String; return a suggested Word

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间