I'm trying to create a damerau-levenshtein distance function in JS.
I've found a description off the algorithm on WIkipedia, but they is no implementation off it. It says:
To devise a proper algorithm to calculate unrestricted Damerau–Levenshtein distance note that there always exists an optimal sequence of edit operations, where once-transposed letters are never modified afterwards. Thus, we need to consider only two symmetric ways of modifying a substring more than once: (1) transpose letters and insert an arbitrary number of characters between them, or (2) delete a sequence of characters and transpose letters that become adjacent after deletion. The straightforward implementation of this idea gives an algorithm of cubic complexity: O\left (M \cdot N \cdot \max(M, N) \right ), where M and N are string lengths. Using the ideas of Lowrance and Wagner,[7] this naive algorithm can be improved to be O\left (M \cdot N \right) in the worst case. It is interesting that the bitap algorithm can be modified to process transposition. See the information retrieval section of[1] for an example of such an adaptation.
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
The section [1] points to http://acl.ldc.upenn.edu/P/P00/P00-1037.pdf which is even more complicated to me.
If I understood this correctly, it's not that easy to create an implementation off it.
Here's the levenshtein implementation I currently use :
levenshtein=function (s1, s2) {
// discuss at: http://phpjs.org/functions/levenshtein/
// original by: Carlos R. L. Rodrigues (http://www.jsfromhell.com)
// bugfixed by: Onno Marsman
// revised by: Andrea Giammarchi (http://webreflection.blogspot.com)
// reimplemented by: Brett Zamir (http://brett-zamir.me)
// reimplemented by: Alexander M Beedie
// example 1: levenshtein('Kevin van Zonneveld', 'Kevin van Sommeveld');
// returns 1: 3
if (s1 == s2) {
return 0;
}
var s1_len = s1.length;
var s2_len = s2.length;
if (s1_len === 0) {
return s2_len;
}
if (s2_len === 0) {
return s1_len;
}
// BEGIN STATIC
var split = false;
try {
split = !('0')[0];
} catch (e) {
// Earlier IE may not support access by string index
split = true;
}
// END STATIC
if (split) {
s1 = s1.split('');
s2 = s2.split('');
}
var v0 = new Array(s1_len + 1);
var v1 = new Array(s1_len + 1);
var s1_idx = 0,
s2_idx = 0,
cost = 0;
for (s1_idx = 0; s1_idx < s1_len + 1; s1_idx++) {
v0[s1_idx] = s1_idx;
}
var char_s1 = '',
char_s2 = '';
for (s2_idx = 1; s2_idx <= s2_len; s2_idx++) {
v1[0] = s2_idx;
char_s2 = s2[s2_idx - 1];
for (s1_idx = 0; s1_idx < s1_len; s1_idx++) {
char_s1 = s1[s1_idx];
cost = (char_s1 == char_s2) ? 0 : 1;
var m_min = v0[s1_idx + 1] + 1;
var b = v1[s1_idx] + 1;
var c = v0[s1_idx] + cost;
if (b < m_min) {
m_min = b;
}
if (c < m_min) {
m_min = c;
}
v1[s1_idx + 1] = m_min;
}
var v_tmp = v0;
v0 = v1;
v1 = v_tmp;
}
return v0[s1_len];
}
What are your ideas for building such an algorithm and, if you think it would be too complicated, what could I do to make no difference between 'l' (L lowercase) and 'I' (i uppercase) for example.