How to convert a ASCII NULL (NUL) into single spac

2019-05-30 09:11发布

问题:

When I BCP the data in sql server

In the output file I am getting a NUL like character in the output file, and i want to replace this with the single blank space.

When I used the below sed command it removes the NUL character but between those 2 delimiter we don't have single space.

sed 's/\x0/ /g' output file name

Example: After sed command i am getting output file like below

PHMO||P00000005233
PHMO||P00000005752

But i need a single spacing in between those delimiter as

PHMO| |P00000005233
PHMO| |P00000005752

回答1:

The usual approach to this would be using tr. However, solutions with tr and sed are not portable. (The question is tagged "unix", so only portable solutions are interesting).

Here is a simple demo script

#!/bin/sh
date
tr '\000' ' ' <$0.in
date
sed -e 's/\x00/ /g' <$0.in

which I named foo, and its input (with the ASCII NUL shown here as ^@):

this is a null: "^@"

Running with GNU tr and sed:

Fri Apr  1 04:41:15 EDT 2016
this is a null: " "
Fri Apr  1 04:41:15 EDT 2016
this is a null: " "

With OSX:

Fri Apr  1 04:41:53 EDT 2016
this is a null: " "
Fri Apr  1 04:41:53 EDT 2016
this is a null: "^@"

With Solaris 10 (and 11, though there may be a recent change):

Fri Apr  1 04:38:08 EDT 2016
this is a null: ""
Fri Apr  1 04:38:08 EDT 2016
this is a null: ""

Bear in mind that sed is line-oriented, and that ASCII NUL is considered a binary (non-line) character. If you want a portable solution, then other tools such as Perl (which do not have that limitation) are useful. For that case one could add this to the script:

perl -np -e 's/\0/ /g' <$0.in

The intermediate tool awk is no better in this instance. Going to Solaris again, with these lines:

for awk in awk nawk mawk gawk
do
echo "** $awk:"
$awk '{ gsub("\0"," "); print; }' <$0.in
done

I see this output:

** awk:
awk: syntax error near line 1
awk: illegal statement near line 1
** nawk:
nawk: empty regular expression
 source line number 1
 context is
        { gsub("\0"," >>>  ") <<<
** mawk:
this is a null: " "
** gawk:
this is a null: " "

Further reading:

  • sed - stream editor (POSIX)
  • tr - translate characters (POSIX), which notes

Unlike some historical implementations, this definition of the tr utility correctly processes NUL characters in its input stream. NUL characters can be stripped by using:

tr -d '\000'

  • perlrun - how to execute the Perl interpreter


回答2:

This is an easy job for sed. Let's start creating a test file as you didn't provide one:

$ echo -e "one,\x00,two,\x00,three" > a
$ echo -e "four,\x00,five,\x00,six" >> a

As you can see it contains ASCII 0:

$ od -c a
0000000   o   n   e   ,  \0   ,   t   w   o   ,  \0   ,   t   h   r   e
0000020   e  \n   f   o   u   r   ,  \0   ,   f   i   v   e   ,  \0   ,
0000040   s   i   x  \n
0000044

Now let's run sed:

$ sed 's/\x00/ /g' a > b

And check the output:

$ cat b
one, ,two, ,three
four, ,five, ,six
$ od -c b
0000000   o   n   e   ,       ,   t   w   o   ,       ,   t   h   r   e
0000020   e  \n   f   o   u   r   ,       ,   f   i   v   e   ,       ,
0000040   s   i   x  \n
0000044


回答3:

it can be done quite easily with perl

cat -v inputfile.txt
abc^@def^@ghij^@klmnop^@qrstuv^@wxyz

perl -np -e 's/\0/ /g' <inputfile.txt >outputfile.txt

cat -v outputfile.txt
abc def ghij klmnop qrstuv wxyz