Lightweight fuzzy search library

Can you suggest some light weight fuzzy text search library?

What I want to do is to allow users to find correct data for search terms with typos.

I could use full-text search engines like Lucene, but I think it's an overkill.

Edit:
To make question more clear here is a main scenario for that library:
I have a large list of strings. I want to be able to search in this list (something like MSVS' intellisense) but it should be possible to filter this list by string which is not present in it but close enough to some string which is in the list.
Example:

Red
Green
Blue

When I type 'Gren' or 'Geen' in a text box, I want to see 'Green' in the result set.

Main language for indexed data will be English.

I think that Lucene is to heavy for that task.

Update:

I found one product matching my requirements. It's ShuffleText.
Do you know any alternatives?

标签： fuzzy-search

8条回答

Melony?

2楼-- · 2020-02-08 09:27

I'm not sure how well Lucene is suited for fuzzy searching, the custom library would be better choice. For example, this search is done in Java and works pretty fast, but it is custom made for such task: http://www.softcorporation.com/products/people/

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2020-02-08 09:29

Try Walnutil - based on Lucene API - integrated to SQL Server and Oracle DBs . You can create any type of index and then use it. For simple search you can use some methods from walnutilsoft, for more complicated search cases you can use Lucene API. See web based example where was used indexes created from Walnutil Tools. Also you can see some code example written on Java and C# which you can use it for creating different type of search. This tools is free. http://www.walnutilsoft.com/

0人赞添加讨论(0) 举报

爷、活的狠高调

4楼-- · 2020-02-08 09:33

If you can choose to use a database, I recommend using PostgreSQL and its fuzzy string matching functions.

If you can use Ruby, I suggest looking into the amatch library.

0人赞添加讨论(0) 举报

混吃等死

5楼-- · 2020-02-08 09:33

@aku - links to working soundex libraries are right there at the bottom of the page.

As for Levenshtein distance, the Wikipedia article on that also has implementations listed at the bottom.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

6楼-- · 2020-02-08 09:41

Soundex is very 'English' in it's encoding - Daitch-Mokotoff works better for many names, especially European (Germanic) and Jewish names. In my UK-centric world, it's what I use.

Wiki here.

0人赞添加讨论(0) 举报

淡お忘

7楼-- · 2020-02-08 09:42

A powerful, lightweight solution is sphinx.

It's smaller then Lucene and it supports disambiguation.

It's written in c++, it's fast, battle-tested, has libraries for every env and it's used by large companies, like craigslists.org

0人赞添加讨论(0) 举报

1 2 下一页

Lightweight fuzzy search library

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间