Merge values for same key

2020-04-08 02:02发布

问题:

Is that possible to use awk to values of same key into one row? For instance

a,100
b,200
a,131
a,102
b,203
b,301

Can I convert them to a file like this:

a,100,131,102
b,200,203,301

回答1:

You can use awk like this:

awk -F, '{a[$1] = a[$1] FS $2} END{for (i in a) print i a[i]}' file

a,100,131,102
b,200,203,301

We use -F, to use comma as delimiter and use array a to keep aggregated value.

Reference: Effective AWK Programming



回答2:

If Perl is an option,

perl -F, -lane '$a{$F[0]} = "$a{$F[0]},$F[1]"; END{for $k (sort keys %a){print "$k$a{$k}"}}' file

These command-line options are used:

  • -n loop around each line of the input file
  • -l removes newlines before processing, and adds them back in afterwards
  • -a autosplit mode – split input lines into the @F array. Defaults to splitting on whitespace.
  • -e execute the perl code
  • -F autosplit modifier, in this case splits on ,

@F is the array of words in each line, indexed starting with $F[0]
$F[0] is the first element in @F (the key)
$F[1] is the second element in @F (the value)
%a is a hash which stores a string containing all matches of each key



回答3:

tl;dr

If you presort the input, it is possible to use sed to join the lines, e.g.:

sort foo | sed -nE ':a; $p; N; s/^([^,]+)([^\n]+)\n\1/\1\2/; ta; P; s/.+\n//; ba'

A bit more explanation

The above one-liner can be saved into a script file. See below for a commented version.

parse.sed

# A goto label
:a

# Always print when on the last line
$p

# Read one more line into pattern space and join the
# two lines if the key fields are identical
N
s/^([^,]+)([^\n]+)\n\1/\1\2/

# Jump to label 'a' and redo the above commands if the
# substitution command was successful
ta

# Assuming sorted input, we have now collected all the
# fields for this key, print it and move on to the next
# key
P
s/.+\n//
ba

The logic here is as follows:

  1. Assume sorted input.
  2. Look at two consecutive lines. If their key fields match, remove the key from the second line and append the value to the first line.
  3. Repeat 2. until key matching fails.
  4. Print the collected values and reset to collect values for the next key.

Run it like this:

sort foo | sed -nEf parse.sed

Output:

a,100,102,131
b,200,203,301


回答4:

With datamash

$ datamash -st, -g1 collapse 2 <ip.txt
a,100,131,102
b,200,203,301

From manual:

-s, --sort

sort the input before grouping; this removes the need to manually pipe the input through 'sort'

-t, --field-separator=X

use X instead of TAB as field delimiter

-g, --group=X[,Y,Z]

group via fields X,[Y,Z]

collapse

comma-separated list of all input values