KDB +具有AJ是通常用于连接沿时间列的表函数。
下面是一个例子,我有交易和报价表,我为每一个行业的普遍报价。
q)5# t
time sym price size
-----------------------------
09:30:00.439 NVDA 13.42 60511
09:30:00.439 NVDA 13.42 60511
09:30:02.332 NVDA 13.42 100
09:30:02.332 NVDA 13.42 100
09:30:02.333 NVDA 13.41 100
q)5# q
time sym bid ask bsize asize
-----------------------------------------
09:30:00.026 NVDA 13.34 13.44 3 16
09:30:00.043 NVDA 13.34 13.44 3 17
09:30:00.121 NVDA 13.36 13.65 1 10
09:30:00.386 NVDA 13.36 13.52 21 1
09:30:00.440 NVDA 13.4 13.44 15 17
q)5# aj[`time; t; q]
time sym price size bid ask bsize asize
-----------------------------------------------------
09:30:00.439 NVDA 13.42 60511 13.36 13.52 21 1
09:30:00.439 NVDA 13.42 60511 13.36 13.52 21 1
09:30:02.332 NVDA 13.42 100 13.34 13.61 1 1
09:30:02.332 NVDA 13.42 100 13.34 13.61 1 1
09:30:02.333 NVDA 13.41 100 13.34 13.51 1 1
我该怎么办使用熊猫相同的操作? 我与交易和报价dataframes其中索引是datetime64工作。
In [55]: quotes.head()
Out[55]:
bid ask bsize asize
2012-09-06 09:30:00.026000 13.34 13.44 3 16
2012-09-06 09:30:00.043000 13.34 13.44 3 17
2012-09-06 09:30:00.121000 13.36 13.65 1 10
2012-09-06 09:30:00.386000 13.36 13.52 21 1
2012-09-06 09:30:00.440000 13.40 13.44 15 17
In [56]: trades.head()
Out[56]:
price size
2012-09-06 09:30:00.439000 13.42 60511
2012-09-06 09:30:00.439000 13.42 60511
2012-09-06 09:30:02.332000 13.42 100
2012-09-06 09:30:02.332000 13.42 100
2012-09-06 09:30:02.333000 13.41 100
我看到熊猫有ASOF功能,但是未对数据帧定义,仅在系列对象。 我想一个可以遍历每个系列,并通过一个对准他们,但我想知道是否有更好的方法?
当你在问题中提到,通过循环每一列应该为你工作:
df1.apply(lambda x: x.asof(df2.index))
我们有可能创造DataFrame.asof更快南天真的版本做一次性的所有列。 但现在,我觉得这是最直接的方法。
我写了下广告ordered_merge
前段时间功能:
In [27]: quotes
Out[27]:
time bid ask bsize asize
0 2012-09-06 09:30:00.026000 13.34 13.44 3 16
1 2012-09-06 09:30:00.043000 13.34 13.44 3 17
2 2012-09-06 09:30:00.121000 13.36 13.65 1 10
3 2012-09-06 09:30:00.386000 13.36 13.52 21 1
4 2012-09-06 09:30:00.440000 13.40 13.44 15 17
In [28]: trades
Out[28]:
time price size
0 2012-09-06 09:30:00.439000 13.42 60511
1 2012-09-06 09:30:00.439000 13.42 60511
2 2012-09-06 09:30:02.332000 13.42 100
3 2012-09-06 09:30:02.332000 13.42 100
4 2012-09-06 09:30:02.333000 13.41 100
In [29]: ordered_merge(quotes, trades)
Out[29]:
time bid ask bsize asize price size
0 2012-09-06 09:30:00.026000 13.34 13.44 3 16 NaN NaN
1 2012-09-06 09:30:00.043000 13.34 13.44 3 17 NaN NaN
2 2012-09-06 09:30:00.121000 13.36 13.65 1 10 NaN NaN
3 2012-09-06 09:30:00.386000 13.36 13.52 21 1 NaN NaN
4 2012-09-06 09:30:00.439000 NaN NaN NaN NaN 13.42 60511
5 2012-09-06 09:30:00.439000 NaN NaN NaN NaN 13.42 60511
6 2012-09-06 09:30:00.440000 13.40 13.44 15 17 NaN NaN
7 2012-09-06 09:30:02.332000 NaN NaN NaN NaN 13.42 100
8 2012-09-06 09:30:02.332000 NaN NaN NaN NaN 13.42 100
9 2012-09-06 09:30:02.333000 NaN NaN NaN NaN 13.41 100
In [32]: ordered_merge(quotes, trades, fill_method='ffill')
Out[32]:
time bid ask bsize asize price size
0 2012-09-06 09:30:00.026000 13.34 13.44 3 16 NaN NaN
1 2012-09-06 09:30:00.043000 13.34 13.44 3 17 NaN NaN
2 2012-09-06 09:30:00.121000 13.36 13.65 1 10 NaN NaN
3 2012-09-06 09:30:00.386000 13.36 13.52 21 1 NaN NaN
4 2012-09-06 09:30:00.439000 13.36 13.52 21 1 13.42 60511
5 2012-09-06 09:30:00.439000 13.36 13.52 21 1 13.42 60511
6 2012-09-06 09:30:00.440000 13.40 13.44 15 17 13.42 60511
7 2012-09-06 09:30:02.332000 13.40 13.44 15 17 13.42 100
8 2012-09-06 09:30:02.332000 13.40 13.44 15 17 13.42 100
9 2012-09-06 09:30:02.333000 13.40 13.44 15 17 13.41 100
它可以很容易地(当然,有人谁是熟悉代码)扩展为“左连接”模仿KDB。 我在这种情况下向前填充的贸易数据实现是不恰当的; 只是说明该功能。
熊猫0.19推出了一款ASOF加入 :
pd.merge_asof(trades, quotes, on='time')
语义非常类似Q / KDB +功能。