Excel Exact Word Matching

2019-02-25 22:35发布

问题:

Let's say I have "Vegas is great" in cell A1. I want to write a formula that looks for the exact word, "gas" in cells. Vegas ≠ gas, but the only search formula I'm finding:

 =ISNUMBER(SEARCH("gas",lower(A1))

returns true. Is there anyway to do do exact matching? I'd ideally like it to be non-case sensitive which I believe is satisfied by wrapping A1 in lower().

回答1:

I believe to correctly cover cases you have to pad spaces before and after the term "gas" and the search term. This will ensure that gas will be found at the beginning or end of a cell, and also prevent it from being found in the middle of any words. Your post does not indicate whether punctuation can exist in the file, but to accomodate punctuation padding spaces around the search will not work correctly, you would have to include the case of " gas. " " gas! " etc to allow for any punctuation specifically. If you are worried about catching values like "gas.cost" or similar you can use the same padding around the punctuation search.

=Or(ISNUMBER(SEARCH(" gas ", " "&A1&" ")),ISNUMBER(SEARCH(" gas. ", " "&A1&" ")))

Is a basic search that should return the word gas by itself, or "gas." By padding a space after "gas." in the search it will find it as the final word in a sentence, or at the end of a cell.

Edit: Dropped a parentheses.



回答2:

The Find function is case sensitive. The SEARCH function is not. There is no need for the LOWER function if you are using SEARCH.

SEARCH(<find_text>, <within_text>, [optional]<start_num>)

Wrap both the find_text and within_text in spaces and perform your SEARCH.

        

The formula in B1 is,

=ISNUMBER(SEARCH(" gas ", " "&A1&" "))

Fill down as necessary.



回答3:

One can also use regular expressions in VBA to accomplish this. In Regular Expressions, "\b" represents a word boundary. A word boundary is defined as the position between a word and a non-word character or the beginning or end of the line. Word characters are [A-Za-z0-9_] (letters, digits, and the underscore). Hence, one can use this UDF. You do need to be aware that words which include non-word characters (e.g. a hyphen) may be treated differently than you expect. And if you are dealing with non-English letters, the Pattern would need to be modified.

But the code is fairly compact.

Option Explicit
Function reFindWord(FindWord As String, SearchText As String, Optional MatchCase As Boolean = False) As Boolean
    Dim RE As Object
    Dim sPattern As String
Set RE = CreateObject("vbscript.regexp")
sPattern = "\b" & FindWord & "\b"
With RE
    .Pattern = sPattern
    .ignorecase = Not MatchCase
    reFindWord = .test(SearchText)
End With
End Function


回答4:

I think the only way to cover all possible punctuation surrounding the search word is to create a custom macro function. Use the enhanced split function to tokenize the sentence into an array of words then search the array for a match.

Enhanced split function https://msdn.microsoft.com/en-us/library/aa155763

How to create custom macro http://www.wikihow.com/Create-a-User-Defined-Function-in-Microsoft-Excel

Code to create FindEngWord function

Public Function FindEngWord(ByVal TextToSearch As String, ByVal WordToFind As String) As Boolean

Dim WrdArray() As String
Dim text_string As String
Dim isFound As Boolean

isFound = False

text_string = TextToSearch

WrdArray() = Split(text_string)

isFound = False
For i = 0 To UBound(WrdArray)
    If LCase(WrdArray(i)) = LCase(WordToFind) Then
        isFound = True
    End If
Next i

FindEngWord = isFound

End Function


Public Function Split(ByVal InputText As String, _
         Optional ByVal Delimiter As String) As Variant

' This function splits the sentence in InputText into
' words and returns a string array of the words. Each
' element of the array contains one word.

    ' This constant contains punctuation and characters
    ' that should be filtered from the input string.
    Const CHARS = ".!?,;:""'()[]{}"
    Dim strReplacedText As String
    Dim intIndex As Integer

    ' Replace tab characters with space characters.
    strReplacedText = Trim(Replace(InputText, _
         vbTab, " "))

    ' Filter all specified characters from the string.
    For intIndex = 1 To Len(CHARS)
        strReplacedText = Trim(Replace(strReplacedText, _
            Mid(CHARS, intIndex, 1), " "))
    Next intIndex

    ' Loop until all consecutive space characters are
    ' replaced by a single space character.
    Do While InStr(strReplacedText, "  ")
        strReplacedText = Replace(strReplacedText, _
            "  ", " ")
    Loop

    ' Split the sentence into an array of words and return
    ' the array. If a delimiter is specified, use it.
    'MsgBox "String:" & strReplacedText
    If Len(Delimiter) = 0 Then
        Split = VBA.Split(strReplacedText)
    Else
        Split = VBA.Split(strReplacedText, Delimiter)
    End If
End Function

Can be called from your excel sheet with this.

=FindEngWord(A1,"gas")


回答5:

I think this will handle all the cases that you are planning to handle:

=OR(ISNUMBER(SEARCH(" gas",LOWER(A1), 1 )), LEFT(A1,3)= "gas")

I added a space before the "gas" in the search. And if the gas was the only word in the cell or the first word in the cell, the right part of this function handles that case.