I've been developing an internal website for a portfolio management tool. There is a lot of text data, company names etc. I've been really impressed with some search engines ability to very quickly respond to queries with "Did you mean: xxxx".
I need to be able to intelligently take a user query and respond with not only raw search results but also with a "Did you mean?" response when there is a highly likely alternative answer etc
[I'm developing in ASP.NET (VB - don't hold it against me! )]
UPDATE: OK, how can I mimic this without the millions of 'unpaid users'?
- Generate typos for each 'known' or 'correct' term and perform lookups?
- Some other more elegant method?
You mean to say spell checker? If it is a spell checker rather than a whole phrase then I've got a link about the spell checking where the algorithm is developed in python. Check this link
Meanwhile, I am also working on project that includes searching databases using text. I guess this would solve your problem
Simple. They have tons of data. They have statistics for every possible term, based on how often it is queried, and what variations of it usually yield results the users click... so, when they see you typed a frequent misspelling for a search term, they go ahead and propose the more usual answer.
Actually, if the misspelling is in effect the most frequent searched term, the algorythm will take it for the right one.
Use Levenshtein distance, then create a Metric Tree (or Slim tree) to index words. Then run a 1-Nearest Neighbour query, and you got the result.
Apart from the above answers, in case you want to implement something by yourself quickly, here is a suggestion -
Algorithm
You can find the implementation and detailed documentation of this algorithm on GitHub.
regarding your question how to mimic the behavior without having tons of data - why not use tons of data collected by google? Download the google sarch results for the misspelled word and search for "Did you mean:" in the HTML.
I guess that's called mashup nowadays :-)
This is an old question, and I'm surprised that nobody suggested the OP using Apache Solr.
Apache Solr is a full text search engine that besides many other functionality also provides spellchecking or query suggestions. From the documentation: