I am using the jaro-winkler fuzzy matching to match names.
I am trying to determine a cut-off range for the similarity score. If the names are too different, I want to exclude them for manual review.
While anything below .4 seemed to be different names entirely, the .4 range seemed fairly similar.
But then I came across strange exceptions, where some names in that range are entirely different, while some names are only one or two letters off(see example below).
Can someone explain where there is the wide variation of matching within the same matching score range?
Estrella ANNELISE 0.42
Arienna IREANNA 0.43
Tayvia I TAYVIA 0.43
Amanda IZABEL 0.44
Hunter JOSHUA 0.44
Ryder CHARLES 0.45
Luis ELIZABETH 0.45
Sebastian JOSE 0.45
Christopher CHISTOPHE 0.46
Genayunique GENAY-UNI 0.46
Andreeaonn ADREEAONN 0.46
Chistopher CHRISTOPH 0.46
Dazharicon DAZHARION 0.46
Jennavecia JENNACVEC 0.46
Valentiria VALENTINA 0.46
Abel SAMMUEL 0.46
Dezarea MarieDEZAREA 0.47
Alexander ALEXZANDE 0.47