How do you implement a “Did you mean”? [duplicate]

2019-01-03 07:49发布

Possible Duplicate:
How does the Google “Did you mean?” Algorithm work?

Suppose you have a search system already in your website. How can you implement the "Did you mean:<spell_checked_word>" like Google does in some search queries?

标签: nlp
17条回答
欢心
2楼-- · 2019-01-03 07:57

I would suggest looking at SOUNDEX to find similar words in your database.

You can also access google own dictionary by using the Google API spelling suggestion request.

查看更多
叛逆
3楼-- · 2019-01-03 07:57

Soundex and "Porter stemming" (soundex is trivial, not sure about porter stemming).

查看更多
仙女界的扛把子
4楼-- · 2019-01-03 07:59

Actually what Google does is very much non-trivial and also at first counter-intuitive. They don't do anything like check against a dictionary, but rather they make use of statistics to identify "similar" queries that returned more results than your query, the exact algorithm is of course not known.

There are different sub-problems to solve here, as a fundamental basis for all Natural Language Processing statistics related there is one must have book: Foundation of Statistical Natural Language Processing.

Concretely to solve the problem of word/query similarity I have had good results with using Edit Distance, a mathematical measure of string similarity that works surprisingly well. I used to use Levenshtein but the others may be worth looking into.

Soundex - in my experience - is crap.

Actually efficiently storing and searching a large dictionary of misspelled words and having sub second retrieval is again non-trivial, your best bet is to make use of existing full text indexing and retrieval engines (i.e. not your database's one), of which Lucene is currently one of the best and coincidentally ported to many many platforms.

查看更多
相关推荐>>
5楼-- · 2019-01-03 07:59

There's something called aspell that might help: http://blog.evanweaver.com/files/doc/fauna/raspell/classes/Aspell.html

There's a ruby gem for it, but I don't know how to talk to it from python http://blog.evanweaver.com/files/doc/fauna/raspell/files/README.html

Here's a quote from the ruby implementation

Usage

Aspell lets you check words and suggest corrections. For example:

  string = "my haert wil go on"

  string.gsub(/[\w\']+/) do |word|
    if !speller.check(word)
      # word is wrong
      puts "Possible correction for #{word}:"
      puts speller.suggest(word).first
    end
  end

This outputs:

Possible correction for haert: heart Possible correction for wil: Will

查看更多
Deceive 欺骗
6楼-- · 2019-01-03 08:02

You may want to look at Peter Norvig's "How to Write a Spelling Corrector" article.

查看更多
\"骚年 ilove
7楼-- · 2019-01-03 08:04

If you have industry specific translations, you will likely need a thesaurus. For example, I worked in the jewelry industry and there were abbreviate in our descriptions such as kt - karat, rd - round, cwt - carat weight... Endeca (the search engine at that job) has a thesaurus that will translate from common misspellings, but it does require manual intervention.

查看更多
登录 后发表回答