I'm trying quite hard to write a script that "loopingly" extracts substrings from one file, while getting the information on where to cut from another file. I'm working in bash on MobaXterm. I have the file cut_positions.txt, which is tab delimited and shows name, start point, end point, length, comment:
k141_20066 103484 104617 1133 phnW
k141_20841 13200 14324 1124 phnW
k141_23852 69 452 383 phnW
k141_32328 1 180 179 phnW
and the string_file.txt with the name (it would be no problem to remove/add the ">" in one of the files) and the string (the original strings are way longer, up to 1.000.000 characters):
>k141_10671 CCTTCCCCCACACGCCGCTCTTCCGCTCTTGCTGGCC
>k141_10707 AGGCGGTATCAGACCTTGCCGCAACACTAAGCCCAGTAACGCTGTCGCCCTTATATCTGA
>k141_11190 CTTTTGTGACAGTGCAGGGCAATGGTGGATTTATCAGTATCGGGCAGAA
>k141_1479 AGCCGACAGCAGCGCCGAGGGCACATAATCCGATGACACGATGTCCAAAAGATCCGCCTCGGC
Now I want to use the input from the cut_positions.txt. I want to use the first column to match the right line, then the second column as start point of the substring and the fourth column as length of the substring. This should be done with all lines in cut_positions.txt and written to a new out.txt. To get closer I tried (with my original data):
➤ grep ">k141_28027\b" test_out_one_line.txt | awk '{print substr($2,57251,69)}'
TCACTTGAGCGCAATTATTCGCTCTCCGGCGGCGTCAGCATCAGCCTGATCATGCGTCACCAAAAGTGT
which worked well as handmade way. I figured out as well how to access the different elements in cut_positions.txt (here the first row in the second column):
awk -F '\t' 'NR==1{print $2}' cut_positions.txt
but I can't figure out how to turn this into a loop, as I don't know how to connect the different redirections, piping steps and so on that I used for the small steps. Any help is very much appreciated (and tell me, if you need more sample data)
thanks crazysantaclaus