machine learning to overcome typo errors [closed]

2019-06-08 00:01发布

问题:

I have a list of names of medicines suppose(crocin,seroflo,oxitab,etc).The list is very long. Now suppose I need to find whether a particular medicine is present or not in the list,but also there could be typo errors.supposing I intended to find crocin in the list,but i instead type crosin.I want the machine learning algorithm to overcome this typographical error of mine and for small differences like crocin and crosin, it should return as match found

回答1:

I don't think you need machine learning a simple edit distance algorithm should do that.

https://en.wikipedia.org/wiki/Edit_distance



回答2:

I agree the necessity of using ML methods is doubtful. But if you really want to using learning-based method for "spelling correction" (I am not sure if this works well for medicine names), you can refer papers below:

A winnow-based approach to context-sensitive spelling correction

An improved error model for noisy channel spelling correction

A large scale ranker-based system for search query spelling correction

A discriminative model for query spelling correction with latent structural SVM

A Graph Approach to Spelling Correction in Domain-Centric Search.

And this paper is about correction for person names:

Hashing-based approaches to spelling correction of personal names