I'm trying to clean up a .CSV file in Excel by getting rid of any non-standard characters. The only characters I care about keeping are A-Z, 0-9, and a few standard punctuation marks. Any other characters, I'd like to delete.
I've gotten the following macro to delete an entire row when it finds a cell which contains any characters I haven't specified, but I'm not sure how to get it to actually delete the character itself.
Sub Replace()
Dim sCharOK As String, s As String
Dim r As Range, rc As Range
Dim j As Long
sCharOK = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789, `~!@#$%^&*()_+-=[]\{}|;':"",./<>?™®"
Set r = Worksheets("features").UsedRange.SpecialCells(xlCellTypeConstants, xlTextValues)
' loop through all the cells with text constant values and deletes the rows with characters not in sCharOK
For Each rc In r
s = rc.Value
For j = 1 To Len(s)
If InStr(sCharOK, Mid(s, j, 1)) = 0 Then
rc.EntireRow.Delete
Exit For
End If
Next j
Next rc
End Sub
I assume there's a fairly simple way to adapt this code to that function, but I'm not familiar enough with VBA to really know how to go about doing that. Any insights are greatly appreciated!
If it were me, I would use a replace command on the original string every time I find an invalid char, changing that invalid char to null. Then replace the original cell value with the modified string. Something like this...
One possible way (tested)
I just had to do this today, literally. The script below worked perfect fine for me.
You could also use regular expressions, thereby avoiding needing to examine each character in a loop. (Although the regex engine has to do that).
The Regex pattern, explained below, contains your list of characters, and the character class used says match everything that is not listed.
If speed becomes an issue, you can use vba arrays to speed things up.
Explanation of Regex Pattern
[^\x20-\x7E\x99\xAE]
[^\x20-\x7E\x99\xAE]
\x20-\x7E
\x20
\x7E
\x99
\xAE
Created with RegexBuddy
Another way would be
Range.Replace
like:EDIT
looking at @ryguy72 answer also offers another way if only non-printable characters need to be deleted (at the question something like
µ²äöüßÉõ
will be deleted but this code will not) also assuming that there are no formulas:Or directly run in Immediate window this one-liner: