TSV: how to concatenate field 2s if field 1 is dup

2019-08-07 15:00发布

问题:

I'm building a Swedish-English sentence deck for ANKI from the Creative Common licensed content of tatoeba.org.

Please help me turning sample 1 to sample 2 (preferably in bash):

#sample1
a 1
a 2
b 3
c 4
c 5

#sample2
a 1<br>2
b 3
c 4<br>5

Duplicates in field 1 are always subsequent.

Thank you!

回答1:

One way using awk:

awk 'p==$1{printf "<br>%s", $2;next}{if(p){print ""};p=$1;printf "%s", $0}END{print ""}' file
a 1<br>2
b 3
c 4<br>5


回答2:

perl -ape '$_ = ($l eq $F[0]) ? "<br>$F[1]" : "\n@F"; $l=$F[0]' file


回答3:

Try this awk command also,

awk 'BEGIN {getline; id=$1; line=$0} {if ($1 != id) {print line; line = $0; } else {line = line "<br>" $2;} id=$1;} END {print line;}' file

Otput:

a 1<br>2
b 3
c 4<br>5


回答4:

This might work for you (GNU sed):

sed -r 'N;s/^((\S+\s).*)\n\2/\1<br>/;P;D' file

Compare the current line with the subsequent line and if the keys match combine otherwise print the current line, delete it and repeat.



回答5:

awk '{if(a[$1]){a[$1]=a[$1]"<br>"$2}else{a[$1]=$1FS$2;b[i++]=$1}} END{for(i=0;i in b; i++) print a[b[i]];}' sample1

Output:

a 1<br>2
b 3
c 4<br>5

Creates the output in array a, uses array b to preserve the order of lines.



回答6:

Here is another awk

awk 'f!=$1 {printf (a?RS:"")$0;f=$1;a=1;next} {print "<br>"$2;f=$1;a=0}' file
a 1<br>2
b 3
c 4<br>5