Identifying and removing null characters in UNIX

2019-01-06 09:48发布

I have a text file containing unwanted null characters (ASCII NUL, \0). When I try to view it in vi I see ^@ symbols, interleaved in normal text. How can I:

  1. Identify which lines in the file contain null characters? I have tried grepping for \0 and \x0, but this did not work.

  2. Remove the null characters? Running strings on the file cleaned it up, but I'm just wondering if this is the best way?

8条回答
兄弟一词,经得起流年.
2楼-- · 2019-01-06 10:21

Here is example how to remove NULL characters using ex (in-place):

ex -s +"%s/\%x00//g" -cwq nulls.txt

and for multiple files:

ex -s +'bufdo!%s/\%x00//g' -cxa *.txt

For recursivity, you may use globbing option **/*.txt (if it is supported by your shell).

Useful for scripting since sed and its -i parameter is a non-standard BSD extension.

See also: How to check if the file is a binary file and read all the files which are not?

查看更多
forever°为你锁心
3楼-- · 2019-01-06 10:24

I discovered the following, which prints out which lines, if any, have null characters:

perl -ne '/\000/ and print;' file-with-nulls

Also, an octal dump can tell you if there are nulls:

od file-with-nulls | grep ' 000'
查看更多
在下西门庆
4楼-- · 2019-01-06 10:25

I used:

recode UTF-16..UTF-8 <filename>

to get rid of zeroes in file.

查看更多
小情绪 Triste *
5楼-- · 2019-01-06 10:32

A large number of unwanted NUL characters, say one every other byte, indicates that the file is encoded in UTF-16 and that you should use iconv to convert it to UTF-8.

查看更多
干净又极端
6楼-- · 2019-01-06 10:33

I faced the same error with:

import codecs as cd
f=cd.open(filePath,'r','ISO-8859-1')

I solved the problem by changing the encoding to utf-16

f=cd.open(filePath,'r','utf-16')
查看更多
Fickle 薄情
7楼-- · 2019-01-06 10:34

If the lines in the file end with \r\n\000 then what works is to delete the \n\000 then replace the \r with \n.

tr -d '\n\000' <infile | tr '\r' '\n' >outfile
查看更多
登录 后发表回答