What I would like to do is compute each list separately so for example if I have 5 list ([1,2,3,4,5,6],[2,3,4,5,6],[3,4,5,6],[4,5,6],[5,6])
and I would like to get the 5 lists without the 6 I would do something like :
data=[1,2,3,4,5,6]+[2,3,4,5,6,7]+[3,4,5,6,7,8]+[4,5,6,7,8,9]+[5,6,7,8,9,10]
def function_1(iter_listoflist):
final_iterator=[]
for sublist in iter_listoflist:
final_iterator.append([x for x in sublist if x!=6])
return iter(final_iterator)
sc.parallelize(data,5).glom().mapPartitions(function_1).collect()
then cut the lists so I get the first lists again. Is there a way to simply separate the computation? I don't want the lists to mix and they might be of different sizes.
thank you
Philippe