My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.
Does this exist? Does anyone know if it is available for VBA (Access)?
Nice question! You're question includes a great example of the idea itself.
There is an algorithm called the Russell Soundex algorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Soundits and Soundex are similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]
About Soundex:
The Soundex algorithm is predicated on characteristics of English such as:
- The first letter has high significance
- Many consonants sound similar
- Consonants affect pronunciation more than vowels
One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.
Resources:
- Here is an example that uses VBA for Access.
- There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Edition by Ken Getz and Mike Gilbert.
- There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').
Code Example:
Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.
Option Compare Database
Option Explicit
Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
Dim strSource As String
Dim strOut As String
Dim strValue As String
Dim strPriorValue As String
Dim lngPos As Long
If Not IsError(varText) Then
strSource = Trim$(Nz(varText, vbNullString))
If strSource <> vbNullString Then
strOut = Left$(strSource, 1&)
strPriorValue = SoundexValue(strOut)
lngPos = 2&
Do
strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
strOut = strOut & strValue
strPriorValue = strValue
End If
lngPos = lngPos + 1&
Loop Until Len(strOut) >= 4&
End If
End If
If strOut <> vbNullString Then
Soundex = strOut
Else
Soundex = Null
End If
Exit_Handler:
Exit Function
Err_Handler:
MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
Select Case strChar
Case "B", "F", "P", "V"
SoundexValue = "1"
Case "C", "G", "J", "K", "Q", "S", "X", "Z"
SoundexValue = "2"
Case "D", "T"
SoundexValue = "3"
Case "L"
SoundexValue = "4"
Case "M", "N"
SoundexValue = "5"
Case "R"
SoundexValue = "6"
Case vbNullString
SoundexValue = "0"
Case Else
'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
End Select
End Function
Levenshtein distance
Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:
Function LevenshteinDistance(word1, word2)
Dim s As Variant
Dim t As Variant
Dim d As Variant
Dim m, n
Dim i, j, k
Dim a(2), r
Dim cost
m = Len(word1)
n = Len(word2)
''This is the only way to use
''variables to dimension an array
ReDim s(m)
ReDim t(n)
ReDim d(m, n)
For i = 1 To m
s(i) = Mid(word1, i, 1)
Next
For i = 1 To n
t(i) = Mid(word2, i, 1)
Next
For i = 0 To m
d(i, 0) = i
Next
For j = 0 To n
d(0, j) = j
Next
For i = 1 To m
For j = 1 To n
If s(i) = t(j) Then
cost = 0
Else
cost = 1
End If
a(0) = d(i - 1, j) + 1 '' deletion
a(1) = d(i, j - 1) + 1 '' insertion
a(2) = d(i - 1, j - 1) + cost '' substitution
r = a(0)
For k = 1 To UBound(a)
If a(k) < r Then r = a(k)
Next
d(i, j) = r
Next
Next
LevenshteinDistance = d(m, n)
End Function
Here are a couple working examples of the SOUNDEX algorithm in VBA:
- Demo in MS Access
- Demo in MS Excel
In addition to Soundex, which is often gives you too loose a match to be really useful, you should also look at Soundex2 (a variant of Soundex that is more granular), and for a different kind of matching, Simil(). I use all three.
You are looking for SOUNDEX.
Also consider using the first two or three letters of the first name and last name. In a database I had of 10,000 names Jo Sm (Joe/John/Joan Smith) returned only three or four records.
Also what type of first names. Are you going to get folks using the shortened version? For example my legal first name is Anthony but I'm always called Tony.