I'm trying to create an optical character recognition system with the dictionary.
In fact I don't have an implemented dictionary yet=)
I've heard that there are simple metrics based on Levenstein distance which take in account different distance between different symbols. E.g. 'N' and 'H' are very close to each other and d("THEATRE", "TNEATRE") should be less than d("THEATRE", "TOEATRE") which is impossible using basic Levenstein distance.
Could you help me locating such metric, please.
A few years too late but the following python package (with which I am NOT affiliated) allows for arbitrary weighting of all the Levenshtein edit operations and ASCII character mappings etc.
https://github.com/infoscout/weighted-levenshtein
Also this one (also not affiliated):
This might be what you are looking for: http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance (and kindly some working code is included in the link)
Update:
http://nlp.stanford.edu/IR-book/html/htmledition/edit-distance-1.html
Here is an example (C#) where weight of "replace character" operation depends on distance between character codes:
You see how it works here: http://ideone.com/RblFK