How to remove blank lines from a Unix file

2019-01-13 05:53发布

问题:

I need to remove all the blank lines from an input file and write into an output file. Here is my data as below.

11216,33,1032747,64310,1,0,0,1.878,0,0,0,1,1,1.087,5,1,1,18-JAN-13,000603221321

11216,33,1033196,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,059762153003

11216,33,1033246,31300,1,0,0,1.5391,0,0,0,1,1,1.054,5,1,1,18-JAN-13,000603211032

11216,33,1033280,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,055111034001

11216,33,1033287,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000378689701

11216,33,1033358,31118,1,0,0,1.5513,0,0,0,1,1,1.115,5,1,1,18-JAN-13,000093737301

11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041926

11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802041954

11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049326

11216,33,1035476,37340,1,0,0,1.7046,0,0,0,1,1,1.123,5,1,1,18-JAN-13,045802049383

11216,33,1036985,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000093415580

11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781202001

11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781261305

11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781603955

11216,33,1037003,15151,1,0,0,1.4436,0,0,0,1,1,1.065,5,1,1,18-JAN-13,000781615746

回答1:

sed -i '/^$/d' foo

This tells sed to delete every line matching the regex ^$ i.e. every empty line. The -i flag edits the file in-place, if your sed doesn't support that you can write the output to a temporary file and replace the original:

sed '/^$/d' foo > foo.tmp
mv foo.tmp foo

If you also want to remove lines consisting only of whitespace (not just empty lines) then use:

sed -i '/^[[:space:]]*$/d' foo

Edit: also remove whitespace at the end of lines, because apparently you've decided you need that too:

sed -i '/^[[:space:]]*$/d;s/[[:space:]]*$//' foo


回答2:

awk 'NF' filename

awk 'NF > 0' filename

sed -i '/^$/d' filename

awk '!/^$/' filename

awk '/./' filename

The NF also removes lines containing only blanks or tabs, the regex /^$/ does not.



回答3:

Use grep to match any line that has nothing between the start anchor (^) and the end anchor ($):

grep -v '^$' infile.txt > outfile.txt

If you want to remove lines with only whitespace, you can still use grep. I am using Perl regular expressions in this example, but here are other ways:

grep -P -v '^\s*$' infile.txt > outfile.txt

or, without Perl regular expressions:

grep -v '^[[:space:]]*$' infile.txt > outfile.txt


回答4:

sed -e '/^ *$/d' input > output

Deletes all lines which consist only of blanks (or is completely empty). You can change the blank to [ \t] where the \t is a representation for tab. Whether your shell or your sed will do the expansion varies, but you can probably type the tab character directly. And if you're using GNU or BSD sed, you can do the edit in-place, if that's what you want, with the -i option.


If I execute the above command still I have blank lines in my output file. What could be the reason?

There could be several reasons. It might be that you don't have blank lines but you have lots of spaces at the end of a line so it looks like you have blank lines when you cat the file to the screen. If that's the problem, then:

sed -e 's/  *$//' -e '/^ *$/d' input > output

The new regex removes repeated blanks at the end of the line; see previous discussion for blanks or tabs.

Another possibility is that your data file came from Windows and has CRLF line endings. Unix sees the carriage return at the end of the line; it isn't a blank, so the line is not removed. There are multiple ways to deal with that. A reliable one is tr to delete (-d) character code octal 15, aka control-M or \r or carriage return:

tr -d '\015' < input | sed -e 's/  *$//' -e '/^ *$/d' > output

If neither of those works, then you need to show a hex dump or octal dump (od -c) of the first two lines of the file, so we can see what we're up against:

head -n 2 input | od -c

Judging from the comments that sed -i does not work for you, you are not working on Linux or Mac OS X or BSD — which platform are you working on? (AIX, Solaris, HP-UX spring to mind as relatively plausible possibilities, but there are plenty of other less plausible ones too.)

You can try the POSIX named character classes such as sed -e '/^[[:space:]]*$/d'; it will probably work, but is not guaranteed. You can try it with:

echo "Hello World" | sed 's/[[:space:]][[:space:]]*/   /'

If it works, there'll be three spaces between the 'Hello' and the 'World'. If not, you'll probably get an error from sed. That might save you grief over getting tabs typed on the command line.



回答5:

grep . file

grep looks at your file line-by-line; the dot . matches anything except a newline character. The output from grep is therefore all the lines that consist of something other than a single newline.



回答6:

with awk

awk 'NF > 0' filename



回答7:

You can sed's -i option to edit in-place without using temporary file:

 sed -i '/^$/d' file


回答8:

To be thorough and remove lines even if they include spaces or tabs something like this in perl will do it:

cat file.txt | perl -lane "print if /\S/"

Of course there are the awk and sed equivalents. Best not to assume the lines are totally blank as ^$ would do.

Cheers