Removing Any Non-Specified Characters from an Exce

2019-07-14 16:56发布

I'm trying to clean up a .CSV file in Excel by getting rid of any non-standard characters. The only characters I care about keeping are A-Z, 0-9, and a few standard punctuation marks. Any other characters, I'd like to delete.

I've gotten the following macro to delete an entire row when it finds a cell which contains any characters I haven't specified, but I'm not sure how to get it to actually delete the character itself.

Sub Replace()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long

sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"

Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

' loop through all the cells with text constant values and deletes the rows with characters not in sCharOK
For Each rc In r
    s = rc.Value
    For j = 1 To Len(s)
        If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
            rc.EntireRow.Delete
            Exit For
        End If
    Next j
Next rc

End Sub

I assume there's a fairly simple way to adapt this code to that function, but I'm not familiar enough with VBA to really know how to go about doing that. Any insights are greatly appreciated!

4条回答
劫难
2楼-- · 2019-07-14 17:17

If it were me, I would use a replace command on the original string every time I find an invalid char, changing that invalid char to null. Then replace the original cell value with the modified string. Something like this...

One possible way (tested)

Sub RemoveInvalidCharacters()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long
Dim badchar As Boolean

sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"

Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

' loop through all the cells with text constant values and
' deletes the invalid characters not in sCharOK from each Value property
For Each rc In r
    badchar = False
    s = rc.Value
    For j = 1 To Len(s)
        If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
            badchar = True
            s = Replace(s, Mid(s, j, 1), "")
        End If
    Next j
    If badchar Then
        rc.Value = s
    End If
Next rc

End Sub
查看更多
可以哭但决不认输i
3楼-- · 2019-07-14 17:26

I just had to do this today, literally. The script below worked perfect fine for me.

Sub Clean_and_Trim_Cells()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Dim s As String
    For Each c In ActiveSheet.UsedRange
        s = c.Value
        If Trim(Application.Clean(s)) <> s Then
            s = Trim(Application.Clean(s))
            c.Value = s
        End If
    Next
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
End Sub
查看更多
放我归山
4楼-- · 2019-07-14 17:31

You could also use regular expressions, thereby avoiding needing to examine each character in a loop. (Although the regex engine has to do that).

The Regex pattern, explained below, contains your list of characters, and the character class used says match everything that is not listed.

If speed becomes an issue, you can use vba arrays to speed things up.

Option Explicit
Sub ReplaceNonStdChars()
    Const sPat As String = "[^\x20-\x7E\x99\xAE]"
    Dim RE As Object
    Dim R As Range, C As Range

Set R = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .Pattern = sPat
    For Each C In R
        C.Value = .Replace(C.Text, "")
    Next C
End With
End Sub

Explanation of Regex Pattern

[^\x20-\x7E\x99\xAE]

[^\x20-\x7E\x99\xAE]

Created with RegexBuddy

查看更多
贼婆χ
5楼-- · 2019-07-14 17:42

Another way would be Range.Replace like:

Sub test()
  Dim sCharOK As String
  sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®" & Chr(1)
  Dim i As Long
  For i = 0 To 255
    If InStr(sCharOK, Chr(i)) = 0 Then
      ActiveSheet.Cells.Replace What:=Chr(i), Replacement:="", LookAt:=xlPart, MatchCase:=True, SearchFormat:=False, ReplaceFormat:=False
    End If
  Next
End Sub

EDIT

looking at @ryguy72 answer also offers another way if only non-printable characters need to be deleted (at the question something like µ²äöüßÉõ will be deleted but this code will not) also assuming that there are no formulas:

Sub test()
  With ActiveSheet.UsedRange
    .Value = Evaluate("TRIM(CLEAN(" & .Address & "))")
  End With
End Sub

Or directly run in Immediate window this one-liner:

ActiveSheet.UsedRange.Value = Evaluate("TRIM(CLEAN(" & ActiveSheet.UsedRange.Address & "))")
查看更多
登录 后发表回答