I have a large list, an excerpt of which looks like:
power = [
['1234-43211', [5, 6, -4, 11, 22]],
['1234-783411', [43, -5, 0, 0, -1]],
['1234-537611', [3, 0, -5, -6, 0]],
['1567-345411', [4, 6, 8, 3, 3]],
['1567-998711', [1, 2, 1, -4, 5]]
]
The first number in the string is the important one, and the one in which I hope to separate my additions. i.e. I only want to add cumulatively the values within each station (and return each singular cumulative addition), never add the values from two different ones.
My goal is to iterate over this list and add cumulatively the int values for a station, return each addition, then start again when the next station is detected in the list.
Desired result:
new = [
[48, 1, -4, 11, -21],
[ 51, 1, -9, 5, -21], '### End of '1234' ### '
[5, 8, 9, -1, 8], '### End of 1567 ###'
] or something similar to this
I have tried the following:
for i in range(len(power)-1):
front_num_1 = power[i][0].split('-')[0]
front_num_2 = power[i+1][0].split('-')[0]
station = '%s' % (front_num_1)
j = power[i][1]
k = power[i+1][1]
if front_num_1 == front_num_2:
print [k + j for k, j in zip(j, k)]
elif front_num_1 != front_num_2:
print '#####################################
else:
print 'END'
However this addition is not cumulative hence no use.
from itertools import groupby, islice
def accumulate(iterable): # in py 3 use itertools.accumulate
''' Simplified version of accumulate from python 3'''
it = iter(iterable)
total = next(it)
yield total
for element in it:
total += element
yield total
power = [
['1234-4321-1', [5, 6, -4, 11, 22]],
['1234-7834-1', [43, -5, 0, 0, -1]],
['1234-5376-1', [3, 0, -5, -6, 0]],
['1567-3454-1', [4, 6, 8, 3, 3]],
['1567-9987-1-', [1, 2, 1, -4, 5]]
]
groups = ((k, (nums for station, nums in g))
for k, g in
groupby(power, lambda x: x[0].partition('-')[0]))
new = [(station, zip(*(islice(accumulate(col), 1, None) for col in zip(*nums))))
for station, nums in groups]
print new
print dict(new) # or as a dictionary which is unordered
Output
[('1234', [(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)]), ('1567', [(5, 8, 9, -1, 8)])]
{'1234': [(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)], '1567': [(5, 8, 9, -1, 8)]}
How this works:
First the lists are grouped based on the station using itertools.groupby
.
Eg.
nums = [[5, 6, -4, 11, 22],
[43, -5, 0, 0, -1],
[3, 0, -5, -6, 0]]
is the first group. As you can see it is in the form of a matrix.
zip(*nums)
transposes a matrix using argument unpacking. It calls
zip([5, 6, -4, 11, 22], [43, -5, 0, 0, -1], [3, 0, -5, -6, 0])
which creates the list:
cols = [(5, 43, 3), (6, -5, 0), (-4, 0, -5), (11, 0, -6), (22, -1, 0)]
then accumulate is called on each column, here's what that would look like:
>>> [list(accumulate(col)) for col in cols]
[[5, 48, 51], [6, 1, 1], [-4, -4, -9], [11, 11, 5], [22, 21, 21]]
As you can see the first element in each list here is not required so islice
is used to take the elements from index 1
until then end(None
). Here's what that looks like:
>>> [list(islice(accumulate(col), 1, None)) for col in cols]
[[48, 51], [1, 1], [-4, -9], [11, 5], [21, 21]]
Now we just need to transpose this back.
>>> zip(*(islice(accumulate(col), 1, None) for col in cols))
[(48, 1, -4, 11, 21), (51, 1, -9, 5, 21)]
It would help if you broke down your problem into smaller pieces. I seem to understand that you want to 1) split your list based on some criterion, then 2) take the cumulative sum of each sublist (considering each element a vector).
For example:
stationList = [
['1234-4321-1', [5, 6, -4, 11, 22]],
['1234-7834-1', [43, -5, 0, 0, -1]],
['1234-5376-1', [3, 0, -5, -6, 0]],
['1567-3454-1', [4, 6, 8, 3, 3]],
['1567-9987-1-', [1, 2, 1, -4, 5]]
]
Becomes:
{'1234-4321-1': [
<5, 6, -4, 11, 22>,
<5, 6, -4, 11, 22> + <43, -5, 0, 0, -1>,
<5, 6, -4, 11, 22> + <43, -5, 0, 0, -1> + <3, 0, -5, -6, 0>
],
'1567-3454-1': [
<4, 6, 8, 3, 3>,
<4, 6, 8, 3, 3> + <1, 2, 1, -4, 5>
]
}
(where I use <...>
to denote a hypothetical Vector
object, or merely treating the list as a vector.)
Solution
from itertools import *
1) To split a list based on some criterion, use itertools.groupby: documentation here. Or write a generator function.
getStation = lambda x: x[0].split('-')[0]
def groupby_station(inputList):
return groupby(inputList, key=getStation)
2) A cumulative sum can be written as a generator function. You can use numpy
, or just write it yourself.
def listAdd(*lists):
"""
listAdd([1,2,3], [10,20,30]) -> [11,22,33]
listAdd([1,2,3], []) -> [1,2,3]
"""
return [sum(xs) for xs in zip_longest(*lists, fillvalue=0)]
def cumSum(lists):
"""
cumSum([1,2],[10,20],[100,200]) -> ([1,2],[11,22],[111,222])
"""
total = []
for list in lists:
total = listAdd(total, list)
yield total
Now just combine the two:
{key:cumSum(*lists) for key,lists in groupby_station(inputList)}
Note that my definition of cumulative sum is slightly different from yours; you can modify the cumSum
function to match your definition.