How to search for non-ASCII characters with bash t

2020-05-20 02:34发布

I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?

标签： bash unicode grep

2条回答

等我变得足够好

2楼-- · 2020-05-20 02:43

Try this command:

grep -P '[^\x00-\x7f]' file

0人赞添加讨论(0) 举报

Bombasti

3楼-- · 2020-05-20 02:49

Try:

nonascii() { LANG=C grep --color=always '[^ -~]\+'; }

Which can be used like:

printf 'ŨTF8\n' | nonascii

Within [] ^ means "not". So [^ -~] means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f] below. The \+ means 1 or more and will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence

0人赞添加讨论(0) 举报

How to search for non-ASCII characters with bash t

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间