Pythonistic way to intersect and add elements of l

2019-05-15 04:13发布

问题:

I have 3 lists, a, b and c

Each of this lists contains tuples with 3 numbers.

Here is an example input:

a = [(1,2,4),(1,7,8),(1,5,4),(3,6,7)]
b = [(1,2,5),(1,9,3),(1,0,3),(3,6,8)]
c = [(2,6,3),(2,4,9),(2,8,5),(1,2,7)]

I'm looking for a way to generate a list that takes elements of those 3 lists if the two firsts items of eachs tuple are equals, and addind the third element.

In the data I gave, there is only 1 set of tuple with the 2 first values equals : (1,2,4), (1,2,5) and (1,2,7).

If I add their third value I have 4+5+7 = 16, so with those data, I should have [(1,2,16)] in the end.

The two first values are uniques in each list, [(1,2,7),(1,2,15)] won't exist.

The problem isn't finding tuples where only the two first values are equals, it's easyly done with a list comprehension. But I'm stuck on finding a pythonistic way to add the third value at the same time.

I can make this:

elem_list = []
for elem in a:
    b_elem = [i for i in b if i[:-1] == elem[:-1]]
    c_elem = [i for i in c if i[:-1] == elem[:-1]]
    if len(b_elem) != 0 and len(c_elem) != 0:
        elem_list.append((elem[0],elem[1], elem[2]+b_elem[0][2]+c_elem[0][2]))        

That give me the desired output, but it's really long, and that's why I'm sure the is a pythonistic way to do this without trouble, I just can't figure it out.

回答1:

Here's one way to do it:

from itertools import product, starmap

def solve(*tups):
    key = tups[0][:2]
    if all(x[:2] == key for x in tups):
        return key + (sum(x[2] for x in tups), )

for p in product(a, b, c):
    out = solve(*p)
    if out:
        print out
        #(1, 2, 16)

Or a one-liner using the above function:

print filter(None, starmap(solve, product(a, b, c)))
#[(1, 2, 16)]


回答2:

Not very efficient but will do what you want:

a = [(1,2,4),(1,7,8),(1,5,4),(3,6,7)]
b = [(1,2,5),(1,9,3),(1,0,3),(3,6,8)]
c = [(2,6,3),(2,4,9),(2,8,5),(1,2,7)]
from itertools import product

print(filter(lambda x: x[0][:2] == x[1][:2] == x[2][:2] ,product(a,b,c)))

[((1, 2, 4), (1, 2, 5), (1, 2, 7))]


回答3:

Here is one way without considering any efficiency (it loops i * j * k times assuming i,j and k are the lengths of your lists a,b,c).

from operator import itemgetter
f = itemgetter(0,1)    
print [(x[0],x[1],x[2]+y[2]+z[2]) for x in a for y in b for z in c if f(x)==f(y)==f(z)]

output:

[(1, 2, 16)]


回答4:

Use the first two elements of the triple as a key to a dictionary. Add the third element of the triple and use that as the value of a dictionary entry.

d = {}
# group the tuples and sum 
for x,y,z in a+b+c:
    d[(x,y)] = d.get((x,y), 0) + z

results = []
# sort the keys and convert to a list of tuples
for k in sorted(d.keys()):
    x,y = k
    results.append((x,y,d[(x,y)]))

print results


回答5:

For good measure, here's a nice boring way that separates the locations with your custom match logic (e.g. the first two tuple components), the transformation logic (e.g. summing the third component), into plain old boring helper functions, and then does a simple recursive call with a boring for-loop (shrinking each time by filtering out non-matches) -- which is one way to avoid the wasteful call to itertools.product or starmap.

from functools import partial
from operator import eq, is_not, itemgetter

a = [(1,2,4),(1,7,8),(1,5,4),(3,6,7)]
b = [(1,2,5),(1,9,3),(1,0,3),(3,6,8)]
c = [(2,6,3),(2,4,9),(2,8,5),(1,2,7)]

is_not_none = partial(is_not, None)

def my_match_criterion(t1, t2):
    return eq(*map(itemgetter(0,1), (t1, t2)))

def my_transformation(t1, t2):
    return t1[0:2] + (t1[2] + t2[2],)

def collapse_matches_with_transformation(tups, *args):
    if args == ():
        return tups
    else:   
        collapsed = collapse_matches_with_transformation(*args)
        for i,c in enumerate(collapsed):
            include = False
            for t in tups:
                if my_match_criterion(t, c):
                    collapsed[i], include = my_transformation(t, c), True
            if not include:
                collapsed[i] = None
        return filter(is_not_none, collapsed) 

print collapse_matches_with_transformation(a, b, c)

I probably represent the grumpy contrarian -- my opinion is that this is at least as Pythonic as any comprehension business. It's become a bit too fashionable to appropriate the term "Pythonic" to mean "brevity of syntax at all costs." This is perpetuated by many folks who become so accustomed to looking at one-liner comprehensions or inline lambda functions serving as key arguments. The artificial ease with which they can read these things, an artifact of mere familiarity, clouds thinking about whether that way really is more "readable" and certainly whether it is good from an encapsulation point of view.

Of course if your problem only has to be solved one single time on one single small instance, like when playing around in the interpreter, then whatever works...

But if you might revisit this code, if there's even a slight chance of that happening, why not write the different pieces of your requirements into different, separated functions?

In this problem there are several things at work: (1) how many lists-of-tuples might need to be processed? Is it always only 3? (2) How likely is the matching condition to change? If you suddenly need to include a new piece of data, to make your tuples 4-tuples that match on the first 3 elements, how much code needs to change and in how many places? (3) What if you need to change the transformation? Instead of summing across the 3rd element, what if you need to multiply, or sum additional elements?

These considerations have to be budgeted for in almost any real problem (read: any location you use this code more than one single time).

In any of these cases, all of the junk code involving throwing around lots of stuff like lambda x: x[0:2] ... blah or just putting the logic x[2] + y[2] + z[2] right into the comprehension that returns the result, etc., only gives false brevity because the system is very fragile w.r.t. the assumptions that it's just exactly 3 lists of 3-tuples whose 3rd components only ever need to be summed under the sole matching condition that the first two components match.

Finally, even if you know that all of these things will be fixed, the better way to provide brevity is to change the data structure. For example, if you first convert your lists-of-tuples into lists-of-Counters, with the sub-tuple of the first two elements as keys, it's very fast to do this operation:

from collections import Counter

def countify(tups):
    return Counter({t[0:2]:t[2] for t in tups})

a, b, c = map(countify, (a,b,c))
common_keys = set(a.keys()).intersection(b, c)
totals = a + b + c

print [k + (itemgetter(k)(totals),) for k in common_keys]

Most folks would surely say that this second approach is more "Pythonic" -- but really this is only true if you don't mind severely committing to the fact that you are summing values keyed on the first two components of the original tuples. The code doesn't cleanly generalize to more than 3 lists-of-tuples, it isn't robust to slight changes in the transformation or the data presentation.

That's why I think this kind of code golf brevity should not be synonymous with "Pythonic" -- which should be much more about what is pragmatic and straightforward. "Simple" does not necessarily mean "brief" and "complicated" is not perfectly correlated with "more lines of code." "Readability" is extremely subjective and varies hugely by experience.

Do the boring thing! Write that extra helper function! Write that for-loop! You'll be more excited that you did later!