What command can I use to identify and remove certain strange characters that form "words" such as:
í‰äó_
퀌¢í‰ä‰åí‰ä‹¢
it퀌¢í‰ä‰åí‰ä‹¢
í‰äóìgo
from a series of files? Those are some examples... I want to remove such occurrences.
What command can I use to identify and remove certain strange characters that form "words" such as:
í‰äó_
퀌¢í‰ä‰åí‰ä‹¢
it퀌¢í‰ä‰åí‰ä‹¢
í‰äóìgo
from a series of files? Those are some examples... I want to remove such occurrences.
Using the string
module after you've gotten the data from the file:
import string
final_str = ''
for char in my_str:
if char in string.printable:
final_str += char
Alternative one-liner:
''.join([str(char) for char in my_str if char in string.printable])
Since you tagged shell
and command-line
, here you go
$ tr -cd [:graph:][:space:] < foo.txt
_
it
go
How about a regex sub?
something like:
import re
clean_name = re.sub(r'[^a-zA-Z0-9\._-]', '', dirty_name)
Add to the regex any other allowed char.