可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a list of lists. Each sublist has a length that varies between 1 and 100. Each sublist contains a particle ID at different times in a set of data. I would like to form lists of all particle IDs at a given time. To do this I could use something like:
list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
list2 = [item[0] for item in list]
list2 would contain the first elements of each sublist in list. I would like to do this operation not just for the first element, but for every element between 1 and 100. My problem is that element number 100 (or 66 or 77 or whatever) does not exists for every sublist.
Is there some way of creating a lists of lists, where each sublist is the list of all particle IDs at a given time.
I have thought about trying to use numpy arrays to solve this problem, as if the lists were all the same length this would be trivial. I have tried adding -1's to the end of each list to make them all the same length, and then masking the negative numbers, but this hasn't worked for me so far. I will use the list of IDs at a given time to slice another separate array:
pos = pos[satIDs]
回答1:
lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8]]
func = lambda x: [line[x] for line in lst if len(line) > x]
func(3)
[4, 8, 7]
func(4)
[5, 8]
--update--
func = lambda x: [ (line[x],i) for i,line in enumerate(lst) if len(line) > x]
func(4)
[(5, 0), (8, 2)]
回答2:
You could use itertools.zip_longest
. This will zip
the lists together and insert None
when one of the lists is exhausted.
>>> lst = [[1,2,3,4,5],['A','B','C'],['a','b','c','d','e','f','g']]
>>> list(itertools.zip_longest(*lst))
[(1, 'A', 'a'),
(2, 'B', 'b'),
(3, 'C', 'c'),
(4, None, 'd'),
(5, None, 'e'),
(None, None, 'f'),
(None, None, 'g')]
If you don't want the None
elements, you can filter them out:
>>> [[x for x in sublist if x is not None] for sublist in itertools.zip_longest(*lst)]
[[1, 'A', 'a'], [2, 'B', 'b'], [3, 'C', 'c'], [4, 'd'], [5, 'e'], ['f'], ['g']]
回答3:
If you want it with a one-line forloop
and in an array
you can do this:
list2 = [[item[i] for item in list if len(item) > i] for i in range(0, 100)]
And if you want to know which id is from which list you can do this:
list2 = [{list.index(item): item[i] for item in list if len(item) > i} for i in range(0, 100)]
list2 would be like this:
[{0: 1, 1: 2, 2: 1}, {0: 2, 1: 6, 2: 3}, {0: 3, 1: 7, 2: 6}, {0: 4, 1: 8, 2: 7},
{0: 5, 2: 8}, {}, {}, ... ]
回答4:
You could append numpy.nan
to your short lists and afterwards create a numpy array
import numpy
import itertools
lst = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8,9]]
arr = numpy.array(list(itertools.izip_longest(*lst, fillvalue=numpy.nan)))
Afterwards you can use numpy slicing as usual.
print arr
print arr[1, :] # [2, 6, 3]
print arr[4, :] # [5, nan, 8]
print arr[5, :] # [nan, nan, 9]
回答5:
Approach #1
One almost* vectorized approach could be suggested that goes along creating ID based on the new order and splitting, like so -
def position_based_slice(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Form ID array that has *ramping* IDs within an element starting from 0
# and restarts with a new element at 0
id_arr = np.ones(lens.sum(),int)
id_arr[lens[:-1].cumsum()] = -lens[:-1]+1
# Get order maintained sorted indices for sorting flattened version of list
ids = np.argsort(id_arr.cumsum(),kind='mergesort')
# Get sorted version and split at boundaries decided by lengths of ids
vals = np.take(np.concatenate(L),ids)
cut_idx = np.where(np.diff(ids)<0)[0]+1
return np.split(vals,cut_idx)
*There is a loop comprehension involved at the start, but being meant to collect just the lengths of the input elements of the list, its effect on the total runtime should be minimal.
Sample run -
In [76]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [77]: position_based_slice(input_list)
Out[77]:
[array([1, 2, 1, 3]), # input_list[ID=0]
array([2, 6, 3, 2]), # input_list[ID=1]
array([3, 7, 6]), # input_list[ID=2]
array([4, 8, 7]), # input_list[ID=3]
array([5, 8])] # input_list[ID=4]
Approach #2
Here's another approach that creates a 2D
array, which is easier to index and trace back to original input elements. This uses NumPy broadcasting alongwith boolean indexing. The implementation would look something like this -
def position_based_slice_2Dgrid(L):
# Get lengths of each element in input list
lens = np.array([len(item) for item in L])
# Create a mask of valid places in a 2D grid mapped version of list
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape,-1,dtype=int)
out[mask] = np.concatenate(L)
return out
Sample run -
In [126]: input_list = [[1,2,3,4,5],[2,6,7,8],[1,3,6,7,8],[3,2]]
In [127]: position_based_slice_2Dgrid(input_list)
Out[127]:
array([[ 1, 2, 3, 4, 5],
[ 2, 6, 7, 8, -1],
[ 1, 3, 6, 7, 8],
[ 3, 2, -1, -1, -1]])
So, now each column of the output would correspond to your ID based outputting.