Vectorized look-up of values in Pandas dataframe

2019-01-16 13:24发布

问题:

I have two pandas dataframes one called 'orders' and another one called 'daily_prices'. daily_prices is as follows:

              AAPL    GOOG     IBM    XOM
2011-01-10  339.44  614.21  142.78  71.57
2011-01-13  342.64  616.69  143.92  73.08
2011-01-26  340.82  616.50  155.74  75.89
2011-02-02  341.29  612.00  157.93  79.46
2011-02-10  351.42  616.44  159.32  79.68
2011-03-03  356.40  609.56  158.73  82.19
2011-05-03  345.14  533.89  167.84  82.00
2011-06-03  340.42  523.08  160.97  78.19
2011-06-10  323.03  509.51  159.14  76.84
2011-08-01  393.26  606.77  176.28  76.67
2011-12-20  392.46  630.37  184.14  79.97

orders is as follows:

           direction  size ticker  prices
2011-01-10       Buy  1500   AAPL  339.44
2011-01-13      Sell  1500   AAPL  342.64
2011-01-13       Buy  4000    IBM  143.92
2011-01-26       Buy  1000   GOOG  616.50
2011-02-02      Sell  4000    XOM   79.46
2011-02-10       Buy  4000    XOM   79.68
2011-03-03      Sell  1000   GOOG  609.56
2011-03-03      Sell  2200    IBM  158.73
2011-06-03      Sell  3300    IBM  160.97
2011-05-03       Buy  1500    IBM  167.84
2011-06-10       Buy  1200   AAPL  323.03
2011-08-01       Buy    55   GOOG  606.77
2011-08-01      Sell    55   GOOG  606.77
2011-12-20      Sell  1200   AAPL  392.46

index of both dataframes is datetime.date. 'prices' column in the 'orders' dataframe was added by using a list comprehension to loop through all the orders and look up the specific ticker for the specific date in the 'daily_prices' data frame and then adding that list as a column to the 'orders' dataframe. I would like to do this using an array operation rather than something that loops. can it be done? i tried to use:

daily_prices.ix[dates,tickers]

but this returns a matrix of cartesian product of the two lists. i want it to return a column vector of only the price of a specified ticker for a specified date.

回答1:

Use our friend lookup, designed precisely for this purpose:

In [17]: prices
Out[17]: 
              AAPL    GOOG     IBM    XOM
2011-01-10  339.44  614.21  142.78  71.57
2011-01-13  342.64  616.69  143.92  73.08
2011-01-26  340.82  616.50  155.74  75.89
2011-02-02  341.29  612.00  157.93  79.46
2011-02-10  351.42  616.44  159.32  79.68
2011-03-03  356.40  609.56  158.73  82.19
2011-05-03  345.14  533.89  167.84  82.00
2011-06-03  340.42  523.08  160.97  78.19
2011-06-10  323.03  509.51  159.14  76.84
2011-08-01  393.26  606.77  176.28  76.67
2011-12-20  392.46  630.37  184.14  79.97

In [18]: orders
Out[18]: 
                  Date direction  size ticker  prices
0  2011-01-10 00:00:00       Buy  1500   AAPL  339.44
1  2011-01-13 00:00:00      Sell  1500   AAPL  342.64
2  2011-01-13 00:00:00       Buy  4000    IBM  143.92
3  2011-01-26 00:00:00       Buy  1000   GOOG  616.50
4  2011-02-02 00:00:00      Sell  4000    XOM   79.46
5  2011-02-10 00:00:00       Buy  4000    XOM   79.68
6  2011-03-03 00:00:00      Sell  1000   GOOG  609.56
7  2011-03-03 00:00:00      Sell  2200    IBM  158.73
8  2011-06-03 00:00:00      Sell  3300    IBM  160.97
9  2011-05-03 00:00:00       Buy  1500    IBM  167.84
10 2011-06-10 00:00:00       Buy  1200   AAPL  323.03
11 2011-08-01 00:00:00       Buy    55   GOOG  606.77
12 2011-08-01 00:00:00      Sell    55   GOOG  606.77
13 2011-12-20 00:00:00      Sell  1200   AAPL  392.46

In [19]: prices.lookup(orders.Date, orders.ticker)
Out[19]: 
array([ 339.44,  342.64,  143.92,  616.5 ,   79.46,   79.68,  609.56,
        158.73,  160.97,  167.84,  323.03,  606.77,  606.77,  392.46])


回答2:

Below for everyone's convenience to reproduce the results.

columns=['AAPL','GOOG','IBM','XOM']
index = ['2011-01-10','2011-01-13','2011-01-26','2011-02-02','2011-02-10','2011-03-03','2011-05-03','2011-06-03','2011-06-10','2011-08-01','2011-12-20']
prices = pd.DataFrame(columns=columns, index=index)
prices.iloc[0]=[339.44,614.21,142.78,71.57]
prices.iloc[1]=[342.64,616.69,143.92,73.08]
prices.iloc[2]=[340.82,616.50,155.74,75.89]
prices.iloc[3]=[341.29,612.00,157.93,79.46]
prices.iloc[4]=[351.42,616.44,159.32,79.68]
prices.iloc[5]=[356.40,609.56,158.73,82.19]
prices.iloc[6]=[345.14,533.89,167.84,82.00]
prices.iloc[7]=[340.42,523.08,160.97,78.19]
prices.iloc[8]=[323.03,509.51,159.14,76.84]
prices.iloc[9]=[393.26,606.77,176.28,76.67]
prices.iloc[10]=[392.46,630.37,184.14,79.97]

columns=['Date','direction','size','ticker','prices']
orders = pd.DataFrame(columns=columns)
orders.loc[0] = ['2011-01-10','Buy',1500,'AAPL',339.44]
orders.loc[1] = ['2011-01-13','Sell',1500,'AAPL',342.64]
orders.loc[2] = ['2011-01-13','Buy',4000,'IBM',143.92]
orders.loc[3] = ['2011-01-26','Buy',1000,'GOOG',616.50]
orders.loc[4] = ['2011-02-02','Sell',4000,'XOM',79.46]
orders.loc[5] = ['2011-02-10','Buy',4000,'XOM',79.68]
orders.loc[6] = ['2011-03-03','Sell',1000,'GOOG',609.56]
orders.loc[7] = ['2011-03-03','Sell',2200,'IBM',158.73]
orders.loc[8] = ['2011-06-03','Sell',3300,'IBM',160.97]
orders.loc[9] = ['2011-05-03','Buy',1500,'IBM',167.84]
orders.loc[10] = ['2011-06-10','Buy',1200,'AAPL',323.03]
orders.loc[11] = ['2011-08-01','Buy',55,'GOOG',606.77]
orders.loc[12] = ['2011-08-01','Sell',55,'GOOG',606.77]
orders.loc[13] = ['2011-12-20','Sell',1200,'AAPL',392.46]

lookupValues = prices.lookup(orders.Date, orders.ticker)

Then result:

>>> prices
              AAPL    GOOG     IBM    XOM
2011-01-10  339.44  614.21  142.78  71.57
2011-01-13  342.64  616.69  143.92  73.08
2011-01-26  340.82   616.5  155.74  75.89
2011-02-02  341.29     612  157.93  79.46
2011-02-10  351.42  616.44  159.32  79.68
2011-03-03   356.4  609.56  158.73  82.19
2011-05-03  345.14  533.89  167.84     82
2011-06-03  340.42  523.08  160.97  78.19
2011-06-10  323.03  509.51  159.14  76.84
2011-08-01  393.26  606.77  176.28  76.67
2011-12-20  392.46  630.37  184.14  79.97
>>> orders
          Date direction  size ticker  prices
0   2011-01-10       Buy  1500   AAPL  339.44
1   2011-01-13      Sell  1500   AAPL  342.64
2   2011-01-13       Buy  4000    IBM  143.92
3   2011-01-26       Buy  1000   GOOG  616.50
4   2011-02-02      Sell  4000    XOM   79.46
5   2011-02-10       Buy  4000    XOM   79.68
6   2011-03-03      Sell  1000   GOOG  609.56
7   2011-03-03      Sell  2200    IBM  158.73
8   2011-06-03      Sell  3300    IBM  160.97
9   2011-05-03       Buy  1500    IBM  167.84
10  2011-06-10       Buy  1200   AAPL  323.03
11  2011-08-01       Buy    55   GOOG  606.77
12  2011-08-01      Sell    55   GOOG  606.77
13  2011-12-20      Sell  1200   AAPL  392.46
>>> lookupValues
array([339.44, 342.64, 143.92, 616.5 ,  79.46,  79.68, 609.56, 158.73,
       160.97, 167.84, 323.03, 606.77, 606.77, 392.46])
>>>