Identify and remove strange characters

2019-09-18 02:52发布

What command can I use to identify and remove certain strange characters that form "words" such as:

í‰äó_
퀌¢í‰ä‰åí‰ä‹¢
it퀌¢í‰ä‰åí‰ä‹¢
í‰äóìgo

from a series of files? Those are some examples... I want to remove such occurrences.

3条回答
爷、活的狠高调
2楼-- · 2019-09-18 03:38

Since you tagged shell and command-line, here you go

$ tr -cd [:graph:][:space:] < foo.txt
_

it
go
查看更多
小情绪 Triste *
3楼-- · 2019-09-18 03:38

How about a regex sub?

something like:

import re

clean_name = re.sub(r'[^a-zA-Z0-9\._-]', '', dirty_name)

Add to the regex any other allowed char.

查看更多
来,给爷笑一个
4楼-- · 2019-09-18 03:53

Using the string module after you've gotten the data from the file:

import string
final_str = ''
for char in my_str:
    if char in string.printable:
        final_str += char

Alternative one-liner:

''.join([str(char) for char in my_str if char in string.printable])
查看更多
登录 后发表回答