How to look back at previous rows from within Pand

2019-04-14 01:17发布

问题:

I am researching/backtesting a trading system.

I have a Pandas dataframe containing OHLC data and have added several calculated columns which identify price patterns that I will use as signals to initiate positions.

I would now like to add a further column that will keep track of the current net position. I have tried using df.apply(), but passing the dataframe itself as the argument instead of the row object, as with the latter I seem to be unable to look back at previous rows to determine whether they resulted in any price patterns:

open_campaigns = []
Campaign = namedtuple('Campaign', 'open position stop')

def calc_position(df):
  # sum of current positions + any new positions

  if entered_long(df):
    open_campaigns.add(
        Campaign(
            calc_long_open(df.High.shift(1)), 
            calc_position_size(df), 
            calc_long_isl(df)
        )
    )

  return sum(campaign.position for campaign in open_campaigns)

def entered_long(df):
  return buy_pattern(df) & (df.High > df.High.shift(1))

df["Position"] = df.apply(lambda row: calc_position(df), axis=1)

However, this returns the following error:

ValueError: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', u'occurred at index 1997-07-16 08:00:00')

Rolling window functions would seem to be the natural fit, but as I understand it, they only act on a single time series or column, so wouldn't work either as I need to access the values of multiple columns at multiple timepoints.

How should I in fact be doing this?

回答1:

This problem has its roots in NumPy.

def entered_long(df):
  return buy_pattern(df) & (df.High > df.High.shift(1))

entered_long is returning an array-like object. NumPy refuses to guess if an array is True or False:

In [48]: x = np.array([ True,  True,  True], dtype=bool)

In [49]: bool(x)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

To fix this, use any or all to specify what you mean for an array to be True:

def calc_position(df):
  # sum of current positions + any new positions

  if entered_long(df).any():  # or .all()

The any() method will return True if any of the items in entered_long(df) are True. The all() method will return True if all the items in entered_long(df) are True.