I'm looking to compute the the Levenshtein-distance between sequences containing up to 6 values. The order of these values should not affect the distance.
How would I implement this into the iterative or recursive algorithm?
Example:
# Currently
>>> LDistance('dog', 'god')
2
# Sorted
>>> LDistance('dgo', 'dgo')
0
# Proposed
>>> newLDistance('dog', 'god')
0
'dog' and 'god' have the exact same letters, sorting the strings before hand will return the desired result. However this doesn't work all the time:
# Currently
>>> LDistance('doge', 'gold')
3
# Sorted
>>> LDistance('dego', 'dglo')
2
# Proposed
>>> newLDistance('doge', 'gold')
1
'doge' and 'gold' have 3/4 matching letters and so should return a distance of 1. Here is my current recursive code:
def mLD(s, t):
memo = {}
def ld(s, t):
if not s: return len(t)
if not t: return len(s)
if s[0] == t[0]: return ld(s[1:], t[1:])
if (s, t) not in memo:
l1 = ld(s, t[1:])
l2 = ld(s[1:], t)
l3 = ld(s[1:], t[1:])
memo[(s,t)] = 1 + min(l1, l2, l3)
return memo[(s,t)]
return ld(s, t)
EDIT: Followup question: Adding exceptions to Levenshtein-Distance-like algorithm