How does the Google “Did you mean?” Algorithm work-第3页回答

I've been developing an internal website for a portfolio management tool. There is a lot of text data, company names etc. I've been really impressed with some search engines ability to very quickly respond to queries with "Did you mean: xxxx".

I need to be able to intelligently take a user query and respond with not only raw search results but also with a "Did you mean?" response when there is a highly likely alternative answer etc

[I'm developing in ASP.NET (VB - don't hold it against me! )]

UPDATE: OK, how can I mimic this without the millions of 'unpaid users'?

Generate typos for each 'known' or 'correct' term and perform lookups?
Some other more elegant method?

标签： algorithm machine-learning nlp spell-checking text-search

18条回答

还给你的自由

2楼-- · 2018-12-31 23:41

You mean to say spell checker? If it is a spell checker rather than a whole phrase then I've got a link about the spell checking where the algorithm is developed in python. Check this link

Meanwhile, I am also working on project that includes searching databases using text. I guess this would solve your problem

0人赞添加讨论(0) 举报

旧人旧事旧时光

3楼-- · 2018-12-31 23:44

Simple. They have tons of data. They have statistics for every possible term, based on how often it is queried, and what variations of it usually yield results the users click... so, when they see you typed a frequent misspelling for a search term, they go ahead and propose the more usual answer.

Actually, if the misspelling is in effect the most frequent searched term, the algorythm will take it for the right one.

0人赞添加讨论(0) 举报

无色无味的生活

4楼-- · 2018-12-31 23:46

Use Levenshtein distance, then create a Metric Tree (or Slim tree) to index words. Then run a 1-Nearest Neighbour query, and you got the result.

0人赞添加讨论(0) 举报

回忆，回不去的记忆

5楼-- · 2018-12-31 23:46

Apart from the above answers, in case you want to implement something by yourself quickly, here is a suggestion -

Algorithm

You can find the implementation and detailed documentation of this algorithm on GitHub.

Create a Priority Queue with a comparator.
Create a Ternay Search Tree and insert all english words (from Norvig's post) along with their frequencies.
Start traversing the TST and for every word encountered in TST, calculate its Levenshtein Distance(LD) from input_word
If LD ≤ 3 then put it in a Priority Queue.
At Last extract 10 words from the Priority Queue and display.

0人赞添加讨论(0) 举报

步步皆殇っ

6楼-- · 2018-12-31 23:49

regarding your question how to mimic the behavior without having tons of data - why not use tons of data collected by google? Download the google sarch results for the misspelled word and search for "Did you mean:" in the HTML.

I guess that's called mashup nowadays :-)

0人赞添加讨论(0) 举报

浮光初槿花落

7楼-- · 2018-12-31 23:50

This is an old question, and I'm surprised that nobody suggested the OP using Apache Solr.

Apache Solr is a full text search engine that besides many other functionality also provides spellchecking or query suggestions. From the documentation:

By default, the Lucene Spell checkers sort suggestions first by the score from the string distance calculation and second by the frequency (if available) of the suggestion in the index.

0人赞添加讨论(0) 举报

上一页 1 2 3

How does the Google “Did you mean?” Algorithm work

Algorithm

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间