Description

Given a dataset that has 10 sequences - a sequence corresponds to a day of stock value recordings - where each constitutes 50 sample recordings of stock values that are separated by 5 minute intervals starting from the morning or 9:05 am. However, there is one extra recording (the 51th sample) that is only available in the training set which is 2 hours later, not 5 minutes, than the last recorded sample in the 50 sample recordings. That 51th sample is required to be predicted for the testing set where the first 50 samples are also given.

I am using the pybrain recurrent neural network for this problem that groups sequences together, and the label (or commonly known as the target y) of each sample x_i is the sample of the next time step x_(i+1) - a typical formulation in time series prediction.

Example

A sequence for one day is something like:

    Signal id    Time      value
        1     -  9:05   -   23
        2     -  9:10   -   31
        3     -  9:15   -   24
       ...    -  ...    -   ...
       50     -  13:15  -   15

Below is the 2 hour later label 'target' given for the training set 
and is required to be predicted for the testing set
       51     -  15:15   -   11

Question

Now that my recurrent neural network (RNN) has trained on these 10 sequences, if it confronts another sequence, how would I use the RNN to predict the stock values 2 hours after the last sample in the sequence ?

Please note that I also have "2 hours later than the last sample stock values" for each of the training sequences but I am not sure how to incorporate that in training the RNN since it expects identical time intervals between samples. Thanks!

I hope I this will help you out

The recurrent network structure

A few tips

Choosing your recurrent network

The more mature Long Short Time Memory (LSTM) neural network is a great fit for this kind of task. LSTM is able to detect common "shapes" and "variations" in the stock value "graph", and there is A LOT of research which tries to prove that such shapes actually occur in real life! See this link for an example.

Accuracy

If you want the network to achieve higher accuracy, I would recommend you to also feed the network the stock values from the previous year (at the exact same date), so that the number of inputs doubles from 50 to 100. Though the network might be well optimised on your dataset, it will never be able to predict the unpredictable behaviour of the future ;)