什么是read_sorted和匹配EXPR pytables结合内存使用效率最高的方式?(What

2019-10-19 08:46发布

我在寻找最有效的记忆方法结合起来阅读Pytables表:在排序顺序(列X,Y,Z)(z支柱包括:CSI)和评估像一个表达式

x+a*y+b*z

其中a和b是常数。 到现在为止我唯一的解决办法是对整个表的“sortyby = Z”标志复制,然后计算表达式分段在桌子上。

注:我想保持结果X + A * Y + B * Z内存做这直接不提供Pytables一些关于它的减少操作,然后将其保存到一个新的Pytables表。

Answer 1:

There are two basic options, depending on if you need to iterate in a sorted fashion or not.

If you need to iterate over the table in a sorted table, then the reading in will be much more expensive than computing the expression. Thus you should efficiently read in using Table.read_sorted() and compute this expression in a list comprehension, or similar:

a = [row['x']+a*row['y']+b*row['z'] for row in 
     tab.read_sorted('z', checkCSI=True)]

If you don't need to iterate in a sorted manner (which it doesn't look like you do), you should set up and evaluate the expression using the Expr class, read in the CSI from the column, and apply this to expression results. This would look something like:

x = tab.cols.x
y = tab.cols.y
z = tab.cols.z
expr = tb.Expr('x+a*y+b*z')
unsorted_res = expr.eval()
idx = z.read_indices()
sorted_res = unsored_res[idx]


文章来源: What is the most memory efficient way to combine read_sorted and Expr in pytables?