一列的结果相结合再总结第2列到名单总在一列中的每个条目(Combine results of col

2019-09-01 19:53发布

我是新手猛砸位,所以请多多包涵在这里。

我有另一个软件倾倒(我有无法控制)上市与次访问某些资源,看起来像这样数量的每个用户的文本文件:

Jim 109
Bob 94
John 92
Sean 91
Mark 85
Richard 84
Jim  79
Bob  70
John 67
Sean 62
Mark 59
Richard 58
Jim  57
Bob  55
John 49
Sean 48
Mark 46
.
.
.

我的目标是获得这样的输出。

Jim  [Total for Jim]
Bob  [Total for Bob]
John [Total for John]

等等。

名称每个I运行在软件中查询时间而改变,所以每个名称静态搜索,然后通过管道厕所没有帮助。

Answer 1:

这听起来像一个工作awk :)管你的程序下面的输出awk脚本:

your_program | awk '{a[$1]+=$2}END{for(name in a)print name " " a[name]}'

输出:

Sean 201
Bob 219
Jim 245
Mark 190
Richard 142
John 208

awk脚本本身可以解释这种格式更好:

# executed on each line
{
  # 'a' is an array. It will be initialized 
  # as an empty array by awk on it's first usage
  # '$1' contains the first column - the name
  # '$2' contains the second column - the amount
  #
  #  on every line the total score of 'name' 
  #  will be incremented  by 'amount'
  a[$1]+=$2
}
# executed at the end of input
END{
  # print every name and its score
  for(name in a)print name " " a[name]
}

请注意,以获得通过得分排序输出,你可以添加其他管道sort -r -k2-r -k2排序通过以相反顺序的第二列:

your_program | awk '{a[$1]+=$2}END{for(n in a)print n" "a[n]}' | sort -r -k2

输出:

Jim 245
Bob 219
John 208
Sean 201
Mark 190
Richard 142


Answer 2:

纯击:

declare -A result                 # an associative array

while read name value; do
  ((result[$name]+=value))
done < "$infile"

for name in ${!result[*]}; do
  printf  "%-10s%10d\n"  $name  ${result[$name]}
done

如果第一个“完成”,但未得到该脚本可以与管道使用的输入文件重定向:

your_program | ./script.sh

和排序输出

your_program | ./script.sh | sort

输出:

Bob              219
Richard          142
Jim              245
Mark             190
John             208
Sean             201


Answer 3:

GNU datamash

datamash -W -s -g1 sum 2 < input.txt

输出:

Bob 219
Jim 245
John    208
Mark    190
Richard 142
Sean    201


文章来源: Combine results of column one Then sum column 2 to list total for each entry in column one