Use SED to delete certain lines using an index wit

2020-03-27 04:01发布

问题:

I get a big file, call it file.txt, which may have 20000 lines or more. Some of those lines have to be removed from the original file, and a new file containing the remaining lines has to be created, like newfile.txt. The lines to be deleted are in another file, like index.txt. So what I is something like:

file.txt:

line1
line2
...
line19999
line20000

index.txt

11
56
79
...
19856

I've been trying to use sed, trying to get it to use the numbers in the index to delete those lines, with something like:

for i in ${index.txt[@]}
do
    sed -i.back '${i}d' file.txt>newfile.txt
done

However, I get an error saying ${index.txt[@]}: bad substitution , and I have no idea how to fix this.

I've also tried to use gawk, but there was something wrong with the code, I think it had to do with the fact that the file is indented with tabs. If anyone could help I'd greatly appreciate it.

回答1:

Do not call sed in a loop, that will be very slow.

You could transform the index file into a sed script, then call sed once on the data file:

sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt

Or, as @Hazzard17 points out, ignore lines that don't contain just digits:

script=$(sed -n '/^[[:blank:]]*[[:digit:]]\+[[:blank:]]*$/ s/$/d/p' index.txt)
sed -i.bak "$script" file.txt

a demo:

$ seq 20000 | sed 's/^/line/' > file.txt
$ wc file.txt
 20000  20000 188894 file.txt
$ seq 20000 | while read n; do [[ $RANDOM -le 5000 ]] && echo $n; done > index.txt
$ wc index.txt
 3083  3083 16789 index.txt
$ sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt
$ wc -l file.txt{,.bak}
 16917 file.txt
 20000 file.txt.bak
 36917 total

To read a file into an array, you can do:

mapfile -t indices < index.txt
for i in "${indices[@]}"; do ...; done

or just iterate over the file

while IFS= read -r i; do ...; done < index.txt


回答2:

Following awk may help you here.

awk 'FNR==NR{a[$0];next} !(FNR in a)' index.txt file1.txt

Considering that your file1.txt file is having line number which we need to delete from file1.txt. Also append > temp_file && mv temp_file file1.txt in case you want to save the output into Input_file(file1.txt) here itself.



回答3:

Here is a solution that does not modify your index.txt and will output the results into newfile.txt:

#replace new lines in the file with "d;"
#After this, linenumbers will contain "11d;56d;79d;..."
linenumbers=$(tr '\n' ';' < index.txt | sed 's/;/d;/g') 

#write file.txt with specified line numbers removed to newfile.txt
sed -e "$linenumbers" file.txt > newfile.txt