Best way to deal with misspellings in a MySQL full

I have about 2000 rows in a mysql database.

Each row is a max of 300 characters and contains a sentence or two.

I use mysql's built in fulltext search to search these rows.

I would like to add a feature so that typos and accidental mispellings are corrected, if possible.

For example, if someone types "right shlder" into the searchbox, this would equate to "right shoulder" when performing the search.

What are your suggestions on the simplest way to add this kind of functionality? Is it worth adding an external search engine of some kind, like lucene? (It seems like for such a small dataset, this is overkill.) Or is there a simpler way?

标签： php mysql lucene full-text-search sphinx

3条回答

闹够了就滚

2楼-- · 2020-02-09 07:41

The technical term for what you are looking for, is Levenshtein distance which is used to calculate the difference between two sequences (in this case a sequence of characters which is a string).

PHP actually has two built in function for that, the first being similar_text and the other called levenshtein which should help you out with your problem. You will have to benchmark if it is fast enough for your needs.

0人赞添加讨论(0) 举报

等我变得足够好

3楼-- · 2020-02-09 07:44

MySQL doesn't support SOUNDEX search in fulltext.

If you want to implemente a lucene like framework, it means that you have to take all the documents, splits them into words, and then builds an index for each word.

When someone search for "right shlder" you have to make a SOUNDEX search for each words in the worlds table:

    $search = 'right shlder';
preg_match_all('(\w+)', $search, $matches);
if (!empty($matches[0]))
   $sounds = array_map('soundex', $matches[0]);
$query = 'SELECT word FROM words_list
    WHERE SOUNDEX(word) IN(\''.join('\',\'',$sounds).'\')';

and then make a fulltext search:

$query2 = 'SELECT * FROM table
    WHERE MATCH(fultextcolumn)
    AGAINST ('.join (' OR ', $resuls).' IN BINARY MODE)';

Where $result is an array with the results of the first query.

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2020-02-09 07:46

I think you should use SOUNDS LIKE or SOUNDEX()

As your data set is so small, one solution may be to create a new table to store the individual words or soundex values contained in each text field and use SOUNDS LIKE on that table.

e.g:

SELECT * FROM table where id IN 
(
    SELECT refid FROM tableofwords 
    WHERE column SOUNDS LIKE 'right' OR column SOUNDS LIKE 'shlder'
)

see: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

I belive it is not possible to wild card seach the string :(

0人赞添加讨论(0) 举报

Best way to deal with misspellings in a MySQL full

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间