How do I get rid of this unicode character?

2019-05-10 13:54发布

Any idea how to get rid of this irritating character U+0092 from a bunch of text files? I've tried all the below but it doesn't work. It's called U+0092+control from the character map

sed -i 's/\xc2\x92//' *
sed -i 's/\u0092//' *
sed -i 's///' *

Ah, I've found a way:

CHARS=$(python2 -c 'print u"\u0092".encode("utf8")')
sed 's/['"$CHARS"']//g'

But is there a direct sed method for this?

标签： unicode sed text-files non-printing-characters

2条回答

我命由我不由天

2楼-- · 2019-05-10 14:06

Try sed "s/\`//g" *. (I added the g so it will remove all the backticks it finds).

EDIT: It's not a backtick that OP wants to remove.

Following the solution in this question, this ought to work:

sed 's/\xc2\x92//g'

To demonstrate it does:

[foo@bar ~]$CHARS=$(python -c 'print u"asdf\u0092asdf".encode("utf8")')
[foo@bar ~]$echo $CHARS
asdf<funny glyph symbol>asdf
[foo@bar ~]$echo $CHARS | sed 's/\xc2\x92//g'
asdfasdf

Seeing as it's something you tried already, perhaps what is in your text file is not U+0092?

0人赞添加讨论(0) 举报

你好瞎i

3楼-- · 2019-05-10 14:23

This might work for you (GNU sed):

echo "string containing funny character(s)" | sed -n 'l0'

This will display the string as sed sees it in octal, then use:

echo "string containing funny character(s)" | sed 's/\onnn//g'

Where nnn is the octal value, to delete it/them.

0人赞添加讨论(0) 举报

How do I get rid of this unicode character?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间