How do you implement a “Did you mean”? [duplicate]

2019-01-03 07:49发布

Possible Duplicate:
How does the Google “Did you mean?” Algorithm work?

Suppose you have a search system already in your website. How can you implement the "Did you mean:<spell_checked_word>" like Google does in some search queries?

标签: nlp
17条回答
Root(大扎)
2楼-- · 2019-01-03 08:05

Implementing spelling correction for search engines in an effective way is not trivial (you can't just compute the edit/levenshtein distance to every possible word). A solution based on k-gram indexes is described in Introduction to Information Retrieval (full text available online).

查看更多
趁早两清
3楼-- · 2019-01-03 08:08

Google's Dr Norvig has outlined how it works; he even gives a 20ish line Python implementation:

http://googlesystem.blogspot.com/2007/04/simplified-version-of-googles-spell.html

http://www.norvig.com/spell-correct.html

Dr Norvig also discusses the "did you mean" in this excellent talk. Dr Norvig is head of research at Google - when asked how "did you mean" is implemented, his answer is authoritive.

So its spell-checking, presumably with a dynamic dictionary build from other searches or even actual internet phrases and such. But that's still spell checking.

SOUNDEX and other guesses don't get a look in, people!

查看更多
狗以群分
4楼-- · 2019-01-03 08:10

I think this depends on how big your website it. On our local Intranet which is used by about 500 member of staff, I simply look at the search phrases that returned zero results and enter that search phrase with the new suggested search phrase into a SQL table.

I them call on that table if no search results has been returned, however, this only works if the site is relatively small and I only do it for search phrases which are the most common.

You might also want to look at my answer to a similar question:

查看更多
一夜七次
5楼-- · 2019-01-03 08:11

U could use ngram for the comparisment: http://en.wikipedia.org/wiki/N-gram

Using python ngram module: http://packages.python.org/ngram/index.html

import ngram

G2 = ngram.NGram([  "iis7 configure ftp 7.5",
                    "ubunto configre 8.5",
                    "mac configure ftp"])

print "String", "\t", "Similarity"
for i in G2.search("iis7 configurftp 7.5", threshold=0.1):
    print i[1], "\t", i[0]

U get:

>>> 
String  Similarity
0.76    "iis7 configure ftp 7.5"    
0.24    "mac configure ftp"
0.19    "ubunto configre 8.5"   
查看更多
唯我独甜
6楼-- · 2019-01-03 08:13

I do it with Lucene's Spell Checker.

查看更多
【Aperson】
7楼-- · 2019-01-03 08:16

I was pleasantly surprised that someone has asked how to create a state-of-the-art spelling suggestion system for search engines. I have been working on this subject for more than a year for a search engine company and I can point to information on the public domain on the subject.

As was mentioned in a previous post, Google (and Microsoft and Yahoo!) do not use any predefined dictionary nor do they employ hordes of linguists that ponder over the possible misspellings of queries. That would be impossible due to the scale of the problem but also because it is not clear that people could actually correctly identify when and if a query is misspelled.

Instead there is a simple and rather effective principle that is also valid for all European languages. Get all the unique queries on your search logs, calculate the edit distance between all pairs of queries, assuming that the reference query is the one that has the highest count.

This simple algorithm will work great for many types of queries. If you want to take it to the next level then I suggest you read the paper by Microsoft Research on that subject. You can find it here

The paper has a great introduction but after that you will need to be knowledgeable with concepts such as the Hidden Markov Model.

查看更多
登录 后发表回答