How do you implement a “Did you mean”? [duplicate]-第2页回答

Possible Duplicate:
How does the Google “Did you mean?” Algorithm work?

Suppose you have a search system already in your website. How can you implement the "Did you mean:<spell_checked_word>" like Google does in some search queries?

标签： nlp

17条回答

Root（大扎）

2楼-- · 2019-01-03 08:05

Implementing spelling correction for search engines in an effective way is not trivial (you can't just compute the edit/levenshtein distance to every possible word). A solution based on k-gram indexes is described in Introduction to Information Retrieval (full text available online).

0人赞添加讨论(0) 举报

趁早两清

3楼-- · 2019-01-03 08:08

Google's Dr Norvig has outlined how it works; he even gives a 20ish line Python implementation:

http://googlesystem.blogspot.com/2007/04/simplified-version-of-googles-spell.html

http://www.norvig.com/spell-correct.html

Dr Norvig also discusses the "did you mean" in this excellent talk. Dr Norvig is head of research at Google - when asked how "did you mean" is implemented, his answer is authoritive.

So its spell-checking, presumably with a dynamic dictionary build from other searches or even actual internet phrases and such. But that's still spell checking.

SOUNDEX and other guesses don't get a look in, people!

0人赞添加讨论(0) 举报

狗以群分

4楼-- · 2019-01-03 08:10

I think this depends on how big your website it. On our local Intranet which is used by about 500 member of staff, I simply look at the search phrases that returned zero results and enter that search phrase with the new suggested search phrase into a SQL table.

I them call on that table if no search results has been returned, however, this only works if the site is relatively small and I only do it for search phrases which are the most common.

You might also want to look at my answer to a similar question:

"Similar Posts" like functionality using MS SQL Server?

0人赞添加讨论(0) 举报

一夜七次

5楼-- · 2019-01-03 08:11

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[1], "\t", i[0]

U get:

>>> 
String  Similarity
0.76    "iis7 configure ftp 7.5"    
0.24    "mac configure ftp"
0.19    "ubunto configre 8.5"

0人赞添加讨论(0) 举报

唯我独甜

6楼-- · 2019-01-03 08:13

I do it with Lucene's Spell Checker.

0人赞添加讨论(0) 举报

【Aperson】

7楼-- · 2019-01-03 08:16

I was pleasantly surprised that someone has asked how to create a state-of-the-art spelling suggestion system for search engines. I have been working on this subject for more than a year for a search engine company and I can point to information on the public domain on the subject.

As was mentioned in a previous post, Google (and Microsoft and Yahoo!) do not use any predefined dictionary nor do they employ hordes of linguists that ponder over the possible misspellings of queries. That would be impossible due to the scale of the problem but also because it is not clear that people could actually correctly identify when and if a query is misspelled.

Instead there is a simple and rather effective principle that is also valid for all European languages. Get all the unique queries on your search logs, calculate the edit distance between all pairs of queries, assuming that the reference query is the one that has the highest count.

This simple algorithm will work great for many types of queries. If you want to take it to the next level then I suggest you read the paper by Microsoft Research on that subject. You can find it here

The paper has a great introduction but after that you will need to be knowledgeable with concepts such as the Hidden Markov Model.

0人赞添加讨论(0) 举报

How do you implement a “Did you mean”? [duplicate]

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间