I have the following two strings:
a = 'bjork gudmundsdottir'
b = 'b. gudmundsson gunnar'
The Levenshtein distance between the two is 12
. When I use the following formula for Levenshtein distance, I get a discrepancy of 0.01
with the python-Levenshtein
library:
>>> Ldist / max(len( a ), len( b ))
>>> float(12)/21
0.5714285714285714
# python-Levenshtein
Levenshtein.ratio(a,b)
0.5853658536585366
# difflib
>>> seq=difflib.SequenceMatcher(a=a,b=b)
>>> seq.ratio()
0.5853658536585366
What accounts for this difference? What am I doing incorrectly in my calculation. Note that I have reviewed this How python-Levenshtein.ratio is computed similar question and it does not quite answer what I am asking.
Could someone please explain the formula that is used to calculate the above ratio?
From Lukas's comment, the reason for this is that the
ratio()
uses a cost of 2 for replace operations rather than the normal cost of 1 for the Levenshtein distance. Here's an example calculation: