How to identify “keys” of a tuple/list of 3-item t

2019-08-12 05:13发布

问题:

Given a table of revenue values thus:

A key point to note (and the core of my question) is the the brand name will almost always, but not always, contain the corresponding product name. In the case of last Banana entry, it doesn't.

I will extract a dict of Brand<->Revenue pairs, fist accounting for those brands that have multiple entries, and summing in those cases, using the approach described here. So:

revenuePerBrandDict = {}
brandRevenueTuples = []
i=0
for brand in ourTab.columns[1][1:-1]: # ignore first (zeroth) and last row
    brandRevenueTuples.append((campaign.value, round(ourTab.columns[3][i].value,2)))
    i+=1
for key, value in brandRevenueTuples:
        revenuePerBrandDict[key] = revenuePerBrandDict.get(key, 0) + value

I will then cross-reference the keys and values in this dict to each dict in (dict of banana expenses, dict of kiwi expenses etc.), and subtract expenses from revenue, item per item. These dicts will be extracted from banana table, kiwi table etc. that look like this:

If the brand name always contained the product name in the revenue table, then in order to compile an appropriate collection of revenue values for comparison with the Banana expenses dict, for example, I would just extract all those brands whose name contained 'Banana', and for matching keys in the Banana expenses dict, perform the extraction on their values.

But it doesn't, so I need another way of knowing that in the Revenue dict, 'OtherBrand' is a Banana. (In the Banana dict, I already know it is a Banana, because it came from the Banana table). Instead of extracting a dict of Brand<->Revenue pairs, I could extract a list or tuple of (tuples of (Product, Brand, Revenue)), and now we have the additional information provided by the Product column. But since a tuple doesn't have the concept of a key, how do I iterate across this new collection, extracting revenue per tuple in the desired way (i.e. with recognition that that OtherBrand is a Banana etc.)

回答1:

You can use the fruits as keys and group the brands:

from collections import defaultdict
import csv

with open("in.csv") as f:
    r = csv.reader(f)
    next(r) # skip header
    # fruite will be keys, values will be dicts
    # with brands as keys  and running totals for rev as values
    d = defaultdict(lambda: defaultdict(int))
    for fruit, brand, rev in r:
        d[fruit][brand] += float(rev)

Which using your input outputs:

from pprint import pprint as pp

pp(dict(d))
{'Apple': defaultdict(<type 'int'>, {'CrunchApple': 1.7}),
 'Banana': defaultdict(<type 'int'>, {'BananaBrand': 4.0,   'OtherBrand': 3.2}),
 'Kiwi': defaultdict(<type 'int'>, {'NZKiwi': 1.2}),
 'Pear': defaultdict(<type 'int'>, {'PearShaped': 6.2})

You can then subtract the expenses using the keys.

Using pandas life is even easier you can groupby and sum:

import pandas as pd

df = pd.read_csv("in.csv")

print(df.groupby(("A","B")).sum())

Output:

A      B               
Apple  CrunchApple  1.7
Banana BananaBrand  4.0
       OtherBrand   3.2
Kiwi   NZKiwi       1.2
Pear   PearShaped   6.2

Or get the groups by fruit and brand:

groups = df.groupby(["A","B"])

print(groups.get_group(('Banana', 'OtherBrand')))

print(groups.get_group(('Banana', 'BananaBrand')))


回答2:

It seems to me that you want to group your data from the first table by product type. I suggest a dictionary where the key is the product type and the value is a list of tuples [(brand, revenue),(..., ...)].

Then, for each product type in the dictionary, you can easily pull out the list of brands for that product and, if needed, make a new dictionary containing lists of 3-tuples (brand, revenue, expenses).