What is the most memory efficient way to combine r

2019-08-01 07:07发布

问题:

I am looking for the most memory efficient way to combine reading a Pytables table (columns: x,y,z) in a sorted order(z column has a CSI) and evaluating an expression like

x+a*y+b*z

where a and b are constant. Up until now my only solution was to copy the entire table with the "sortyby=z" flag and then evaluating the expression piece-wise on the table.

Note: I want to keep the result x+a*y+b*z in memory to do some reduction operations on it which are not available directly in Pytables and then save it into a new Pytables table.

回答1:

There are two basic options, depending on if you need to iterate in a sorted fashion or not.

If you need to iterate over the table in a sorted table, then the reading in will be much more expensive than computing the expression. Thus you should efficiently read in using Table.read_sorted() and compute this expression in a list comprehension, or similar:

a = [row['x']+a*row['y']+b*row['z'] for row in 
     tab.read_sorted('z', checkCSI=True)]

If you don't need to iterate in a sorted manner (which it doesn't look like you do), you should set up and evaluate the expression using the Expr class, read in the CSI from the column, and apply this to expression results. This would look something like:

x = tab.cols.x
y = tab.cols.y
z = tab.cols.z
expr = tb.Expr('x+a*y+b*z')
unsorted_res = expr.eval()
idx = z.read_indices()
sorted_res = unsored_res[idx]