I get a big file, call it file.txt, which may have 20000 lines or more. Some of those lines have to be removed from the original file, and a new file containing the remaining lines has to be created, like newfile.txt. The lines to be deleted are in another file, like index.txt. So what I is something like:
file.txt:
line1
line2
...
line19999
line20000
index.txt
11
56
79
...
19856
I've been trying to use sed, trying to get it to use the numbers in the index to delete those lines, with something like:
for i in ${index.txt[@]}
do
sed -i.back '${i}d' file.txt>newfile.txt
done
However, I get an error saying ${index.txt[@]}: bad substitution , and I have no idea how to fix this.
I've also tried to use gawk, but there was something wrong with the code, I think it had to do with the fact that the file is indented with tabs. If anyone could help I'd greatly appreciate it.
Here is a solution that does not modify your index.txt and will output the results into newfile.txt:
Following
awk
may help you here.Considering that your
file1.txt
file is having line number which we need to delete fromfile1.txt
. Also append> temp_file && mv temp_file file1.txt
in case you want to save the output into Input_file(file1.txt) here itself.Do not call sed in a loop, that will be very slow.
You could transform the index file into a sed script, then call sed once on the data file:
Or, as @Hazzard17 points out, ignore lines that don't contain just digits:
a demo:
To read a file into an array, you can do:
or just iterate over the file