I'm relatively new to scripting and apologize in advance for this painfully simple problem. I believe I've searched pretty thoroughly, but apparently no other answers or cookbooks have been explicit enough for me to understand (like here - still couldn't get it).
I have a file that is made up of strings of letters (DNA, if you care), one string per line. Above each string I've inserted another line to identify the underlying string. For those of you who are bioinformaticians, I'm trying to make up a test data set in fasta format, maybe you have tools? Anyway, I'd put a distinct word, "num", after each ">" with the intention of using a bash incrementer and sed to create a unique number heading each string. For example, in data.txt, I have...
>num, blah, blah, blah
ATCGACTGAATCGA
>num, blah, blah, blah
ATCGATCGATCGATCG
>num, blah, blah, blah
ATCGATCGATCGATCG
I would like it to be...
>0, blah, blah, blah
ATCGACTGAATCGA
>1, blah, blah, blah
ATCGATCGATCGATCG
>2, blah, blah, blah
ATCGATCGATCGATCG
The solution can be in any language as long as it's complete && gets the job done. I have a little experience with sed, awk, bash, and c++ (little == slightly more than no experience). I know, I know, I need to learn perl, but I've only just started. The question is this: How to replace "num" with a number that increments on each replacement? It doesn't matter if the underlying string is identical to another somewhere else. Thanks for your help in advance!