Remove a specific line from a file WITHOUT using s

2020-03-07 11:18发布

问题:

I need to remove a specific line number from a file using a bash script.

I get the line number from the grep command with the -n option.

I cannot use sed for a variety of reasons, least of which is that it is not installed on all the systems this script needs to run on and installing it is not an option.

awk is out of the question because in testing, on different machines with different UNIX/Linux OS's (RHEL, SunOS, Solaris, Ubuntu, etc.), it gives (sometimes wildly) different results on each. So, no awk.

The file in question is just a flat text file, with one record per line, so nothing fancy needs to be done, except for remove the line by number.

If at all possible, I need to avoid doing something like extracting the contents of the file, not including the line I want gone, and then overwriting the original file.

回答1:

Since you have grep, the obvious thing to do is:

$ grep -v "line to remove" file.txt > /tmp/tmp
$ mv /tmp/tmp file.txt
$

But it sounds like you don't want to use any temporary files - I assume the input file is large and this is an embedded system where memory and storage are in short supply. I think you ideally need a solution that edits the file in place. I think this might be possible with dd but haven't figured it out yet :(

Update - I figured out how to edit the file in place with dd. Also grep, head and cut are needed. If these are not available then they can probably be worked around for the most part:

#!/bin/bash

# get the line number to remove
rline=$(grep -n "$1" "$2" | head -n1 | cut -d: -f1)
# number of bytes before the line to be removed
hbytes=$(head -n$((rline-1)) "$2" | wc -c)
# number of bytes to remove
rbytes=$(grep "$1" "$2" | wc -c)
# original file size
fsize=$(cat "$2" | wc -c)
# dd will start reading the file after the line to be removed
ddskip=$((hbytes + rbytes))
# dd will start writing at the beginning of the line to be removed
ddseek=$hbytes
# dd will move this many bytes
ddcount=$((fsize - hbytes - rbytes))
# the expected new file size
newsize=$((fsize - rbytes))
# move the bytes with dd.  strace confirms the file is edited in place
dd bs=1 if="$2" skip=$ddskip seek=$ddseek conv=notrunc count=$ddcount of="$2"
# truncate the remainder bytes of the end of the file
dd bs=1 if="$2" skip=$newsize seek=$newsize count=0 of="$2"

Run it thusly:

$ cat > file.txt
line 1
line two
line 3
$ ./grepremove "tw" file.txt
7+0 records in
7+0 records out
0+0 records in
0+0 records out
$ cat file.txt
line 1
line 3
$ 

Suffice to say that dd is a very dangerous tool. You can easily unintentionally overwrite files or entire disks. Be very careful!



回答2:

Try ed. The here-document-based example below deletes line 2 from test.txt

ed -s test.txt <<!
2d
w
!


回答3:

If n is the line you want to omit:

{
  head -n $(( n-1 )) file
  tail +$(( n+1 )) file
} > newfile


回答4:

You can do it without grep using posix shell builtins which should be on any *nix.

while read LINE || [ "$LINE" ];do
  case "$LINE" in
    *thing_you_are_grepping_for*)continue;;
    *)echo "$LINE";;
  esac
done <infile >outfile


回答5:

Given dd is deemed too dangerous for this in-place line removal, we need some other method where we have fairly fine-grained control over the file system calls. My initial urge is to write something in c, but while possible, I think that is a bit of overkill. Instead it is worth looking to common scripting (not shell-scripting) languages, as these typically have fairly low-level file APIs which map to the file syscalls in a fairly straightforward manner. I'm guessing this can be done using python, perl, Tcl or one of many other scripting language that might be available. I'm most familiar with Tcl, so here we go:

#!/bin/sh
# \
exec tclsh "$0" "$@"

package require Tclx

set removeline [lindex $argv 0]
set filename [lindex $argv 1]

set infile [open $filename RDONLY]
for {set lineNumber 1} {$lineNumber < $removeline} {incr lineNumber} {
    if {[eof $infile]} {
        close $infile
        puts "EOF at line $lineNumber"
        exit
    }
    gets $infile line
}
set bytecount [tell $infile]
gets $infile rmline

set outfile [open $filename RDWR]
seek $outfile $bytecount start

while {[gets $infile line] >= 0} {
    puts $outfile $line
}

ftruncate -fileid $outfile [tell $outfile]
close $infile
close $outfile

Note on my particular box I have Tcl 8.4, so I had to load the Tclx package in order to use the ftruncate command. In Tcl 8.5, there is chan truncate which could be used instead.

You can pass the line number you want to remove and the filename to this script.

In short, the script does this:

  • open the file for reading
  • read the first n-1 lines
  • get the offset of the start of the next line (line n)
  • read line n
  • open the file with a new FD for writing
  • move the file location of the write FD to the offset of the start of line n
  • continue reading the remaining lines from the read FD and write them to the write FD until the whole read FD is read
  • truncate the write FD

The file is edited exactly in place. No temporary files are used.

I'm pretty sure this can be re-written in python or perl or ... if necessary.

Update

Ok, so in-place line removal can be done in almost-pure bash, using similar techniques to the Tcl script above. But the big caveat is that you need to have truncate command available. I do have it on my Ubuntu 12.04 VM, but not on my older Redhat-based box. Here is the script:

#!/bin/bash

n=$1
filename=$2
exec 3<> $filename
exec 4<> $filename
linecount=1
bytecount=0
while IFS="" read -r line <&3 ; do
    if [[ $linecount == $n ]]; then
        echo "omitting line $linecount: $line"
    else
        echo "$line" >&4
        ((bytecount += ${#line} + 1))
    fi
    ((linecount++))
done
exec 3>&-
exec 4>&-

truncate -s $bytecount $filename
#### or if you can tolerate dd, just to do the truncate:
# dd of="$filename" bs=1 seek=$bytecount count=0
#### or if you have python
# python -c "open(\"$filename\", \"ab\").truncate($bytecount)"

I would love to hear of a more generic (bash-only?) way to do the partial truncate at the end and complete this answer. Of course the truncate can be done with dd as well, but I think that was already ruled out for my earlier answer.

And for the record this site lists how to do an in-place file truncation in many different languages - in case any of these could be used in your environment.



回答6:

If you can indicate under which circumstances on which platform(s) the most obvious Awk script is failing for you, perhaps we can devise a workaround.

awk "NR!=$N" infile >outfile

If course, obtaining $N with grep just to feed it to Awk is pretty bass-ackwards. This will delete the line containing the first occurrence of foo:

awk '/foo/ { if (!p++) next } 1' infile >outfile


回答7:

Based on Digital Trauma's answere, I found an improvement that just needs grep and echo, but no tempfile:

echo $(grep -v PATTERN file.txt) > file.txt

Depending on the kind of lines your file contains and whether your pattern requires a more complex syntax or not, you can embrace the grep command with double quotes:

echo "$(grep -v PATTERN file.txt)" > file.txt

(useful when deleting from your crontab)



标签: linux bash unix