Open source spell check

2019-04-04 18:58发布

问题:

Was evaluating adding spell check to a product I own. As per my research the major decisions that need to be made:

  1. The library to use.
  2. Dictionary( this can be region specific, British english, American etc).
  3. Exclusion lists. Anytime a typo is detected its possible that its not a typo but is verbiage specific to the user. At this point the users should be given the ability to
    add this to his custom exclusion list.
  4. Besides a per user custom list also a list of exclusion based on the user space of the clients of the tool. That is terms/acronyms in the users work domain. For example FX will not be a typo for currency traders.

The open questions I had are listed below and if I could get input into them that would be very useful. For 1, I was thinking of hunspell, whcih is the open source library offered under MPL and is used by firefox and OpenOffice family of products. Any horror stories out there using this? Any grey areas with the licensing? The spell checking will happen on a windows client.

Dictionaries are available from a variety of sources some free under MPL while some are not. Any suggestions on good sources for free dictionaries.

Multi lingual support and what needs to be worked out to support them?

For 4, how are custom dictionaries kept in sync with the server side and the clientside? The spell check needs to happen on the clientside so are they pushed down with the initial launch everytime or are they synced up ever so often?

回答1:

As already mentioned Hunspell is a state of the art spell checker. It is the Open Office, Thunderbird, Firefox and Google Chrome spell checker. Ports to all major programming languages are available. It works with the Open Office Directories, so a lot of languages are supported.



回答2:

I've used Hunspell for a few things, and I don't really have any horror stories with it. I've only used it with English (American) though, but it claims to work with other languages.

As for licensing, it offers a choice of GPL, LGPL, and MPL. If you don't like the MPL, you can always choose to use the LGPL.



回答3:

There are several pupular options that widely used: myspell, aspell. Check them.

  • http://en.wikipedia.org/wiki/MySpell
  • http://en.wikipedia.org/wiki/GNU_Aspell


回答4:

Here is a good demonstration by Peter Norvig: I find this simple explanation much more intuitive. Follow the links in the doc as well for more indepth analysis.

http://norvig.com/spell-correct.html