I'm trying to get my head around the amazing list processing abilities of python (And eventually numpy). I'm converting some C code I wrote to python.
I have a text datafile where first row is a header, and then every odd row is my input data and every even row is my output data. All data space separated. I'm quite chuffed that I managed to read all the data into lists using nested list comprehensions. amazing stuff.
with open('data.txt', 'r') as f:
# get all lines as a list of strings
lines = list(f)
# convert header row to list of ints and get info
header = map(int, lines[0].split(' '))
num_samples = header[0]
input_dim = header[1]
output_dim = header[2]
del header
# bad ass list comprehensions
inputs = [[float(x) for x in l.split()] for l in lines[1::2]]
outputs = [[float(x) for x in l.split()] for l in lines[2::2]]
del x, l, lines
Then I want to produce a new list where each element is a function of a corresponding input-output pair. I couldn't figure out how to do this with any python specific optimizations. Here it is in C-style python:
# calculate position
pos_list = [];
pos_y = 0
for i in range(num_samples):
pantilt = outputs[i];
target = inputs[i];
if(pantilt[0] > 90):
pantilt[0] -=180
pantilt[1] *= -1
elif pantilt[0] < -90:
pantilt[0] += 180
pantilt[1] *= -1
tan_pan = math.tan(math.radians(pantilt[0]))
tan_tilt = math.tan(math.radians(pantilt[1]))
pos = [0, pos_y, 0]
pos[2] = tan_tilt * (target[1] - pos[1]) / math.sqrt(tan_pan * tan_pan + 1)
pos[0] = pos[2] * tan_pan
pos[0] += target[0]
pos[2] += target[2]
pos_list.append(pos)
del pantilt, target, tan_pan, tan_tilt, pos, pos_y
I tried to do it with a comprehension, or map but couldn't figure out how to:
- draw from two different lists (both input and output) for each element of the pos_list array
- put the body of the algorithm in the comprehension. would it have to be a separate function or is there a funky way of using lambdas for this?
- would it even be possible to do this with no loops at all, just stick it in numpy and vectorize the whole thing?
If anyone stumbles upon the same question, here are four variations based on Ami's suggestion (functions do1, do1b, do2, do3)
And for those curious, here are the benchmarks (I have ~1000 input-output pairs of data. Maybe with radically more data the benchmarks would vary more)
....
One vectorized approach using
boolean-indexing/mask
-Runtime tests
Suppose you read your file into a list, like so:
The header is this:
The even lines are:
and the odd lines are:
Now you can create a list using
itertools.izip
from these two lists:This is a sort of list-like thingy (you can loop over it, or just write
list( ... )
around it to make it into a true list), whose each entry is a pair of your input-output data.