Remove unicode characters from textfiles - sed , o

2019-01-04 11:03发布

How do I remove unicode characters from a bunch of text files on the terminal? I've tried this but it didn't work:

sed 'g/\u'U+200E'//' -i *.txt

I need to remove these unicodes from the textfiles

U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark

5条回答
劳资没心,怎么记你
2楼-- · 2019-01-04 11:37

If you want to remove ONLY particular characters and you have python, you can:

CHARS=$(python -c 'print u"\u0091\u0092\u00a0\u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt
查看更多
霸刀☆藐视天下
3楼-- · 2019-01-04 11:41

Use iconv:

iconv -f utf8 -t ascii//TRANSLIT < /tmp/utf8_input.txt > /tmp/ascii_output.txt

This will translate characters like "Š" into "S" (most similar looking ones).

查看更多
家丑人穷心不美
4楼-- · 2019-01-04 11:45

Convert Swift files from utf-8 to ascii:

for file in *.swift; do
    iconv -f utf-8 -t ascii "$file" > "$file".tmp
    mv -f "$file".tmp "$file"
done

swift auto completion not working in Xcode6-Beta

查看更多
我命由我不由天
5楼-- · 2019-01-04 11:58

clear all non-ascii chars of file.txt

$ iconv -c -f utf-8 -t ascii file.txt
$ strings file.txt
查看更多
疯言疯语
6楼-- · 2019-01-04 11:58

For utf-8 encoding of unicode, you can use this regular expression for sed:

sed 's/\xc2\x91\|\xc2\x92\|\xc2\xa0\|\xe2\x80\x8e//'
查看更多
登录 后发表回答