Is that possible to use awk
to values of same key into one row?
For instance
a,100
b,200
a,131
a,102
b,203
b,301
Can I convert them to a file like this:
a,100,131,102
b,200,203,301
Is that possible to use awk
to values of same key into one row?
For instance
a,100
b,200
a,131
a,102
b,203
b,301
Can I convert them to a file like this:
a,100,131,102
b,200,203,301
You can use awk like this:
awk -F, '{a[$1] = a[$1] FS $2} END{for (i in a) print i a[i]}' file
a,100,131,102
b,200,203,301
We use -F,
to use comma as delimiter and use array a
to keep aggregated value.
Reference: Effective AWK Programming
If Perl is an option,
perl -F, -lane '$a{$F[0]} = "$a{$F[0]},$F[1]"; END{for $k (sort keys %a){print "$k$a{$k}"}}' file
These command-line options are used:
-n
loop around each line of the input file-l
removes newlines before processing, and adds them back in afterwards -a
autosplit mode – split input lines into the @F
array. Defaults to splitting on whitespace. -e
execute the perl code -F
autosplit modifier, in this case splits on ,
@F
is the array of words in each line, indexed starting with $F[0]
$F[0]
is the first element in @F
(the key)
$F[1]
is the second element in @F
(the value)
%a
is a hash which stores a string containing all matches of each key
If you presort the input, it is possible to use sed
to join the lines, e.g.:
sort foo | sed -nE ':a; $p; N; s/^([^,]+)([^\n]+)\n\1/\1\2/; ta; P; s/.+\n//; ba'
The above one-liner can be saved into a script file. See below for a commented version.
parse.sed
# A goto label
:a
# Always print when on the last line
$p
# Read one more line into pattern space and join the
# two lines if the key fields are identical
N
s/^([^,]+)([^\n]+)\n\1/\1\2/
# Jump to label 'a' and redo the above commands if the
# substitution command was successful
ta
# Assuming sorted input, we have now collected all the
# fields for this key, print it and move on to the next
# key
P
s/.+\n//
ba
The logic here is as follows:
Run it like this:
sort foo | sed -nEf parse.sed
Output:
a,100,102,131
b,200,203,301
With datamash
$ datamash -st, -g1 collapse 2 <ip.txt
a,100,131,102
b,200,203,301
From manual:
-s, --sort
sort the input before grouping; this removes the need to manually pipe the input through 'sort'
-t, --field-separator=X
use X instead of TAB as field delimiter
-g, --group=X[,Y,Z]
group via fields X,[Y,Z]
collapse
comma-separated list of all input values