Python: merging tally data

2019-07-06 22:00发布

Okay - I'm sure this has been answered here before but I can't find it....

My problem: I have a list of lists with this composition

0.2 A

0.1 A

0.3 A

0.3 B

0.2 C

0.5 C

My goal is to output the following:

0.6 A

0.3 B

0.7 C

In other words, I need to merge the data from multiple lines together.

Here's the code I'm using:

unique_percents = []

for line in percents:
    new_percent = float(line[0])
    for inner_line in percents:
        if line[1] == inner_line[1]:
           new_percent += float(inner_line[0])
        else:
            temp = []
            temp.append(new_percent)
            temp.append(line[1])
            unique_percents.append(temp)
            break

I think it should work, but it's not adding the percents up and still has the duplicates. Perhaps I'm not understanding how "break" works?

I'll also take suggestions of a better loop structure or algorithm to use. Thanks, David.

11条回答
地球回转人心会变
2楼-- · 2019-07-06 22:34
letters = {}
for line in open("data", "r"):
    lineStrip = line.strip().split()
    percent = float(lineStrip[0])
    letter = lineStrip[1]
    if letter in letters:
        letters[letter] = percent + letters[letter]
    else:
        letters[letter] = percent

for letter, percent in letters.items():
    print letter, percent

A 0.6
C 0.7
B 0.3
查看更多
Juvenile、少年°
3楼-- · 2019-07-06 22:37

This is verbose, but works:

# Python 2.7
lines = """0.2 A
0.1 A
0.3 A
0.3 B
0.2 C
0.5 C"""

lines = lines.split('\n')
#print(lines)
pctg2total = {}
thing2index = {}
index = 0
for line in lines:
    pctg, thing = line.split()
    pctg = float(pctg)
    if thing not in thing2index:
        thing2index[thing] = index
        index = index + 1
        pctg2total[thing] = pctg
    else:
        pctg2total[thing] = pctg2total[thing] + pctg
output = ((pctg2total[thing], thing) for thing in pctg2total)
# Let's sort by the first occurrence.
output = list(sorted(output, key = lambda thing: thing2index[thing[1]]))
print(output)

>>> 
[(0.60000000000000009, 'A'), (0.29999999999999999, 'B'), (0.69999999999999996, 'C')]
查看更多
等我变得足够好
4楼-- · 2019-07-06 22:44

You want to use a dict, but collections.defaultdict can come in really handy here so that you don't have to worry about whether the key exists in the dict or not -- it just defaults to 0.0:

import collections

lines = [[0.2, 'A'], [0.1, 'A'], [0.3, 'A'], [0.3, 'B'], [0.2, 'C'], [0.5, 'C']]
amounts = collections.defaultdict(float)
for amount, letter in lines:
    amounts[letter] += amount

for letter, amount in sorted(amounts.iteritems()):
    print amount, letter
查看更多
成全新的幸福
5楼-- · 2019-07-06 22:45

If you have a list of lists like this: [ [0.2, A], [0.1, A], ...] (in fact it looks like a list of tuples :)

res_dict = {}

for pair in lst:
    letter = pair[1]
    val = pair[0]
    try:
        res_dict[letter] += val
    except KeyError:
        res_dict[letter] = val

res_lst = [(val, letter) for letter, val in res_dict] # note, a list of tuples!
查看更多
欢心
6楼-- · 2019-07-06 22:46

Since all of the letter grades are grouped together, you can use itertools.groupby (and if not, just sort the list ahead of time to make them so):

data = [
    [0.2, 'A'],
    [0.1, 'A'],
    [0.3, 'A'],
    [0.3, 'B'],
    [0.2, 'C'],
    [0.5, 'C'],
]

from itertools import groupby

summary = dict((k, sum(i[0] for i in items)) 
                for k,items in groupby(data, key=lambda x:x[1]))

print summary

Gives:

{'A': 0.60000000000000009, 'C': 0.69999999999999996, 'B': 0.29999999999999999}
查看更多
霸刀☆藐视天下
7楼-- · 2019-07-06 22:48
total = {}
data = [('0.1', 'A'), ('0.2', 'A'), ('.3', 'B'), ('.4', 'B'), ('-10', 'C')]
for amount, key in data:
    total[key] = total.get(key, 0.0) + float(amount)

for key, amount in total.items():
    print key, amount
查看更多
登录 后发表回答