Finding similar sounding text in VBA [closed]

2019-01-21 20:46发布

问题:

My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.

Does this exist? Does anyone know if it is available for VBA (Access)?

回答1:

Nice question! You're question includes a great example of the idea itself.

There is an algorithm called the Russell Soundex algorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Soundits and Soundex are similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]

About Soundex:

The Soundex algorithm is predicated on characteristics of English such as:

  1. The first letter has high significance
  2. Many consonants sound similar
  3. Consonants affect pronunciation more than vowels

One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.

Resources:

  1. Here is an example that uses VBA for Access.
  2. There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Edition by Ken Getz and Mike Gilbert.
  3. There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').

Code Example:

Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.

Option Compare Database
Option Explicit

Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
    Dim strSource As String
    Dim strOut As String
    Dim strValue As String
    Dim strPriorValue As String
    Dim lngPos As Long

    If Not IsError(varText) Then
        strSource = Trim$(Nz(varText, vbNullString))
        If strSource <> vbNullString Then
            strOut = Left$(strSource, 1&)
            strPriorValue = SoundexValue(strOut)
            lngPos = 2&

            Do
                strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                    strOut = strOut & strValue
                    strPriorValue = strValue
                End If
                lngPos = lngPos + 1&
            Loop Until Len(strOut) >= 4&
        End If
    End If

    If strOut <> vbNullString Then
        Soundex = strOut
    Else
        Soundex = Null
    End If

Exit_Handler:
    Exit Function

Err_Handler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
    Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
    Select Case strChar
    Case "B", "F", "P", "V"
        SoundexValue = "1"
    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        SoundexValue = "2"
    Case "D", "T"
        SoundexValue = "3"
    Case "L"
        SoundexValue = "4"
    Case "M", "N"
        SoundexValue = "5"
    Case "R"
        SoundexValue = "6"
    Case vbNullString
        SoundexValue = "0"
    Case Else
        'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
    End Select
End Function

Levenshtein distance

Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:

Function LevenshteinDistance(word1, word2)

Dim s As Variant
Dim t As Variant
Dim d As Variant
Dim m, n
Dim i, j, k
Dim a(2), r
Dim cost

   m = Len(word1)
   n = Len(word2)

   ''This is the only way to use
   ''variables to dimension an array
   ReDim s(m)
   ReDim t(n)
   ReDim d(m, n)

   For i = 1 To m
       s(i) = Mid(word1, i, 1)
   Next

   For i = 1 To n
       t(i) = Mid(word2, i, 1)
   Next

   For i = 0 To m
       d(i, 0) = i
   Next

   For j = 0 To n
       d(0, j) = j
   Next


   For i = 1 To m
       For j = 1 To n

           If s(i) = t(j) Then
               cost = 0
           Else
               cost = 1
           End If

           a(0) = d(i - 1, j) + 1             '' deletion
           a(1) = d(i, j - 1) + 1             '' insertion
           a(2) = d(i - 1, j - 1) + cost      '' substitution

           r = a(0)

           For k = 1 To UBound(a)
               If a(k) < r Then r = a(k)
           Next

           d(i, j) = r

       Next

   Next

   LevenshteinDistance = d(m, n)

End Function


回答2:

Here are a couple working examples of the SOUNDEX algorithm in VBA:

  • Demo in MS Access
  • Demo in MS Excel


回答3:

In addition to Soundex, which is often gives you too loose a match to be really useful, you should also look at Soundex2 (a variant of Soundex that is more granular), and for a different kind of matching, Simil(). I use all three.



回答4:

You are looking for SOUNDEX.



回答5:

Also consider using the first two or three letters of the first name and last name. In a database I had of 10,000 names Jo Sm (Joe/John/Joan Smith) returned only three or four records.

Also what type of first names. Are you going to get folks using the shortened version? For example my legal first name is Anthony but I'm always called Tony.