可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^@ symbols, interleaved in normal text. How can I:

Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.
Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

回答1:

I’d use tr:

tr < file-with-nulls -d '\000' > file-without-nulls

If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<, >, …) anywhere in the command line, actually.

回答2:

Use the following sed command for removing the null characters in a file.

sed -i 's/\x0//g' null.txt

this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.

回答3:

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

回答4:

I discovered the following, which prints out which lines, if any, have null characters:

perl -ne '/\000/ and print;' file-with-nulls

Also, an octal dump can tell you if there are nulls:

od file-with-nulls | grep ' 000'

回答5:

If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.

tr -d '\n\000' <infile | tr '\r' '\n' >outfile

回答6:

Here is example how to remove NULL characters using ex (in-place):

ex -s +"%s/\%x00//g" -cwq nulls.txt

and for multiple files:

ex -s +'bufdo!%s/\%x00//g' -cxa *.txt

^{For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).}

Useful for scripting since sed and its -i parameter is a non-standard BSD extension.

See also: How to check if the file is a binary file and read all the files which are not?

回答7:

I used:

recode UTF-16..UTF-8 <filename>

to get rid of zeroes in file.

回答8:

I faced the same error with:

import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')

I solved the problem by changing the encoding to utf-16

f=cd.open(filePath,'r','utf-16')

Identifying and removing null characters in UNIX

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

回答8:

收藏的人(0)

Identifying and removing null characters in UNIX

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

回答8:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮