zipline backtesting using non-US (European) intrad

I'm trying to get zipline working with non-US, intraday data, that I've loaded into a pandas DataFrame:

                        BARC    HSBA     LLOY     STAN
Date                                                  
2014-07-01 08:30:00  321.250  894.55  112.105  1777.25
2014-07-01 08:32:00  321.150  894.70  112.095  1777.00
2014-07-01 08:34:00  321.075  894.80  112.140  1776.50
2014-07-01 08:36:00  321.725  894.80  112.255  1777.00
2014-07-01 08:38:00  321.675  894.70  112.290  1777.00

I've followed moving-averages tutorial here, replacing "AAPL" with my own symbol code, and the historical calls with "1m" data instead of "1d".

Then I do the final call using algo_obj.run(DataFrameSource(mydf)), where mydf is the dataframe above.

However there are all sorts of problems arising related to TradingEnvironment. According to the source code:

# This module maintains a global variable, environment, which is
# subsequently referenced directly by zipline financial
# components. To set the environment, you can set the property on
# the module directly:
#       from zipline.finance import trading
#       trading.environment = TradingEnvironment()
#
# or if you want to switch the environment for a limited context
# you can use a TradingEnvironment in a with clause:
#       lse = TradingEnvironment(bm_index="^FTSE", exchange_tz="Europe/London")
#       with lse:
# the code here will have lse as the global trading.environment
#           algo.run(start, end)

However, using the context doesn't seem to fully work. I still get errors, for example stating that my timestamps are before the market open (and indeed, looking at trading.environment.open_and_close the times are for the US market.

My question is, has anybody managed to use zipline with non-US, intra-day data? Could you point me to a resource and ideally example code on how to do this?

n.b. I've seen there are some tests on github that seem related to the trading calendars (tradincalendar_lse.py, tradingcalendar_tse.py , etc) - but this appears to only handle data at the daily level. I would need to fix:

open/close times
reference data for the benchmark
and probably more ...

回答1:

I've got this working after fiddling around with the tutorial notebook. Code sample below. It's using the DF mid, as described in the original question. A few points bear mentioning:

Trading Calendar I create one manually and assign to trading.environment, by using non_working_days in tradingcalendar_lse.py. Alternatively you could create one that fits your data exactly (however could be a problem for out-of-sample data). There are two fields that you need to define: trading_days and open_and_closes.
sim_params There is a problem with the default start/end values because they aren't timezone aware. So you must create a sim_params object and pass start/end parameters with a timezone.
Also, run() must be called with the argument overwrite_sim_params=False as calculate_first_open/close raise timestamp errors.

I should mention that it's also possible to pass pandas Panel data, with fields open,high,low,close,price and volume in the minor_axis. But in this case, the former fields are mandatory - otherwise errors are raised.

Note that this code only produces a daily summary of the performance. I'm sure there must be a way to get the result at a minute resolution (I thought this was set by emission_rate, but apparently it's not). If anybody knows please comment and I'll update the code. Also, not sure what the api call is to call 'analyze' (i.e. when using %%zipline magic in IPython, as in the tutorial, the analyze() method gets automatically called. How do I do this manually?)

import pytz
from datetime import datetime

from zipline.algorithm import TradingAlgorithm
from zipline.utils import tradingcalendar
from zipline.utils import tradingcalendar_lse
from zipline.finance.trading import TradingEnvironment
from zipline.api import order_target, record, symbol, history, add_history
from zipline.finance import trading

def initialize(context):
    # Register 2 histories that track daily prices,
    # one with a 100 window and one with a 300 day window
    add_history(10, '1m', 'price')
    add_history(30, '1m', 'price')

    context.i = 0


def handle_data(context, data):
    # Skip first 30 mins to get full windows
    context.i += 1
    if context.i < 30:
        return

    # Compute averages
    # history() has to be called with the same params
    # from above and returns a pandas dataframe.
    short_mavg = history(10, '1m', 'price').mean()
    long_mavg = history(30, '1m', 'price').mean()

    sym = symbol('BARC')

    # Trading logic
    if short_mavg[sym] > long_mavg[sym]:
        # order_target orders as many shares as needed to
        # achieve the desired number of shares.
        order_target(sym, 100)
    elif short_mavg[sym] < long_mavg[sym]:
        order_target(sym, 0)

    # Save values for later inspection
    record(BARC=data[sym].price,
           short_mavg=short_mavg[sym],
           long_mavg=long_mavg[sym])

def analyze(context,perf) : 
    perf["pnl"].plot(title="Strategy P&L")

# Create algorithm object passing in initialize and
# handle_data functions

# This is needed to handle the correct calendar. Assume that market data has the right index for tradeable days.
# Passing in env_trading_calendar=tradingcalendar_lse doesn't appear to work, as it doesn't implement open_and_closes
from zipline.utils import tradingcalendar_lse
trading.environment = TradingEnvironment(bm_symbol='^FTSE', exchange_tz='Europe/London')
#trading.environment.trading_days = mid.index.normalize().unique()
trading.environment.trading_days = pd.date_range(start=mid.index.normalize()[0],
                                                 end=mid.index.normalize()[-1],
                                                 freq=pd.tseries.offsets.CDay(holidays=tradingcalendar_lse.non_trading_days))

trading.environment.open_and_closes = pd.DataFrame(index=trading.environment.trading_days,columns=["market_open","market_close"])
trading.environment.open_and_closes.market_open = (trading.environment.open_and_closes.index + pd.to_timedelta(60*7,unit="T")).to_pydatetime()
trading.environment.open_and_closes.market_close = (trading.environment.open_and_closes.index + pd.to_timedelta(60*15+30,unit="T")).to_pydatetime()


from zipline.utils.factory import create_simulation_parameters
sim_params = create_simulation_parameters(
   start = pd.to_datetime("2014-07-01 08:30:00").tz_localize("Europe/London").tz_convert("UTC"),  #Bug in code doesn't set tz if these are not specified (finance/trading.py:SimulationParameters.calculate_first_open[close])
   end = pd.to_datetime("2014-07-24 16:30:00").tz_localize("Europe/London").tz_convert("UTC"),
   data_frequency = "minute",
   emission_rate = "minute",
   sids = ["BARC"])
algo_obj = TradingAlgorithm(initialize=initialize, 
                            handle_data=handle_data,
                            sim_params=sim_params)

# Run algorithm
perf_manual = algo_obj.run(mid,overwrite_sim_params=False) # overwrite == True calls calculate_first_open[close] (see above)

回答2:

@Luciano

You can add analyze(None, perf_manual)at the end of your code for automatically running the analyze process.