I get a big file, call it file.txt, which may have 20000 lines or more. Some of those lines have to be removed from the original file, and a new file containing the remaining lines has to be created, like newfile.txt. The lines to be deleted are in another file, like index.txt. So what I is something like:
file.txt:
line1
line2
...
line19999
line20000
index.txt
11
56
79
...
19856
I've been trying to use sed, trying to get it to use the numbers in the index to delete those lines, with something like:
for i in ${index.txt[@]}
do
sed -i.back '${i}d' file.txt>newfile.txt
done
However, I get an error saying ${index.txt[@]}: bad substitution , and I have no idea how to fix this.
I've also tried to use gawk, but there was something wrong with the code, I think it had to do with the fact that the file is indented with tabs. If anyone could help I'd greatly appreciate it.
Do not call sed in a loop, that will be very slow.
You could transform the index file into a sed script, then call sed once on the data file:
sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt
Or, as @Hazzard17 points out, ignore lines that don't contain just digits:
script=$(sed -n '/^[[:blank:]]*[[:digit:]]\+[[:blank:]]*$/ s/$/d/p' index.txt)
sed -i.bak "$script" file.txt
a demo:
$ seq 20000 | sed 's/^/line/' > file.txt
$ wc file.txt
20000 20000 188894 file.txt
$ seq 20000 | while read n; do [[ $RANDOM -le 5000 ]] && echo $n; done > index.txt
$ wc index.txt
3083 3083 16789 index.txt
$ sed -i.bak "$(sed 's/$/d/' index.txt)" file.txt
$ wc -l file.txt{,.bak}
16917 file.txt
20000 file.txt.bak
36917 total
To read a file into an array, you can do:
mapfile -t indices < index.txt
for i in "${indices[@]}"; do ...; done
or just iterate over the file
while IFS= read -r i; do ...; done < index.txt
Following awk
may help you here.
awk 'FNR==NR{a[$0];next} !(FNR in a)' index.txt file1.txt
Considering that your file1.txt
file is having line number which we need to delete from file1.txt
. Also append > temp_file && mv temp_file file1.txt
in case you want to save the output into Input_file(file1.txt) here itself.
Here is a solution that does not modify your index.txt and will output the results into newfile.txt:
#replace new lines in the file with "d;"
#After this, linenumbers will contain "11d;56d;79d;..."
linenumbers=$(tr '\n' ';' < index.txt | sed 's/;/d;/g')
#write file.txt with specified line numbers removed to newfile.txt
sed -e "$linenumbers" file.txt > newfile.txt