I have several files in a directory and in some of them, some patterns occur multiple times. For example
Contents of file "8_list
":
Spiroplasma_taiwanense
Spiroplasma_diminutum
Spiroplasma_apis
Spiroplasma_sabaudiense
Spiroplasma_taiwanense
Spiroplasma_diminutum
Spiroplasma_taiwanense
EntAcro10
EntAcro10
Spiroplasma_apis
Spiroplasma_culicicola
Spiroplasma_sabaudiense
Spiroplasma_diminutum
Spiroplasma_sabaudiense
Spiroplasma_sabaudiense
Spiroplasma_sabaudiense
Spiroplasma_apis
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_diminutum
Spiroplasma_culicicola
Spiroplasma_culicicola
EntAcro1
and contents of file "574_list
"
Mesoplasma_florum_l1
Spiroplasma_sabaudiense
Mesoplasma_florum_w37
EntAcro1
all files have a single column.
What I want to do is within each file find the identical patterns and then add a number next to it describing the occurrence. For example, in file "8_list
" if Spiroplasma_culicicola
occurs 7 times, then next to the first occurrence, it should write Spiroplasma_culicicola_1
,
next to the second occurrence Spiroplasma_culicicola_2
next to the third occurrence Spiroplasma_culicicola_3
etc etc
I tried to do it with sed
by looking for each pattern individually
sed -z 's/Spiroplasma_culicicola/Spiroplasma_culicicola_2/2'
but I was wondering if there is an easier way in order to do it for all my files and all patterns in a given directory
thanks in advance
This is a good task for such nice tool as
awk
:gsub(" ", "", $0);
- replaces trailing space at the end of the linea[$0]++;
- incrementing the number of occurrences of each pattern(column value) treating a column value as an array keyThe output: