可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a text file containing unwanted null characters (ASCII NUL, \0
). When I try to view it in vi
I see ^@
symbols, interleaved in normal text. How can I:
Identify which lines in the file contain null characters? I have tried grepping for \0
and \x0
, but this did not work.
Remove the null characters? Running strings
on the file cleaned it up, but I'm just wondering if this is the best way?
回答1:
I’d use tr
:
tr < file-with-nulls -d '\000' > file-without-nulls
If you are wondering if input redirection in the middle of the command arguments works, it does. Most shells will recognize and deal with I/O redirection (<
, >
, …) anywhere in the command line, actually.
回答2:
Use the following sed command for removing the null characters in a file.
sed -i 's/\x0//g' null.txt
this solution edits the file in place, important if the file is still being used. passing -i'ext' creates a backup of the original file with 'ext' suffix added.
回答3:
A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv
to convert it to UTF-8.
回答4:
I discovered the following, which prints out which lines, if any, have null characters:
perl -ne '/\000/ and print;' file-with-nulls
Also, an octal dump can tell you if there are nulls:
od file-with-nulls | grep ' 000'
回答5:
If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.
tr -d '\n\000' <infile | tr '\r' '\n' >outfile
回答6:
Here is example how to remove NULL characters using ex
(in-place):
ex -s +"%s/\%x00//g" -cwq nulls.txt
and for multiple files:
ex -s +'bufdo!%s/\%x00//g' -cxa *.txt
For recursivity, you may use globbing option **/*.txt
(if it is supported by your shell).
Useful for scripting since sed
and its -i
parameter is a non-standard BSD extension.
See also: How to check if the file is a binary file and read all the files which are not?
回答7:
I used:
recode UTF-16..UTF-8 <filename>
to get rid of zeroes in file.
回答8:
I faced the same error with:
import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')
I solved the problem by changing the encoding to utf-16
f=cd.open(filePath,'r','utf-16')