Averaging over the batch dimension in Keras

I've got a problem where I want to predict one time series with many time series. My input is (batch_size, time_steps, features) and my output should be (1, time_steps, features)

I can't figure out how to average over N.

Here's a dummy example. First, dummy data where the output is a linear function of 200 time series:

import numpy as np
time = 100
N = 2000

dat = np.zeros((N, time))
for i in range(time):
    dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)

y = dat.T @ np.random.normal(size = N)

Now I'll define a time series model (using 1-D conv nets):

from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda
from keras.optimizers import Adam
from keras import backend as K

n_filters = 2
filter_width = 3
dilation_rates = [2**i for i in range(5)] 
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
    x = Conv1D(filters=n_filters,
               kernel_size=filter_width, 
               padding='causal',
               activation = "relu",
               dilation_rate=dilation_rate)(x)
x = Dense(1)(x)

model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape

Out[43]: (2000, 100, 1)

The output is the wrong shape! Next, I tried using an averaging layer, but I get this weird error:

def av_over_batches(x):
    x = K.mean(x, axis = 0)
    return(x)

x = Lambda(av_over_batches)(x)

model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.predict(dat.reshape(N, time, 1)).shape

Traceback (most recent call last):

  File "<ipython-input-3-d43ccd8afa69>", line 4, in <module>
    model.predict(dat.reshape(N, time, 1)).shape

  File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1169, in predict
    steps=steps)

  File "/home/me/.local/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 302, in predict_loop
    outs[i][batch_start:batch_end] = batch_out

ValueError: could not broadcast input array from shape (100,1) into shape (32,1)

Where does 32 come from? (Incidentally, I got the same number in my real data, not just in the MWE).

But the main question is: how can I build a network that averages over the input batch dimension?

I would approach the problem in a different way

Problem: You want to predict a time series from a set of time series. so lets say you have 3 time series value TS1, TS2, TS3 each of 100 time steps you want to predict a time series y1, y2, y3.

My approach for this problem will be as below

i.e group the times series each time step together and feed it to an LSTM. If some time steps are shorter then others them you can pad them. Similarly if some sets have fewer time series then again pad them.

Example:

import numpy as np
np.random.seed(33)

time = 100
N = 5000
k = 5

magic = np.random.normal(size = k)

x = list()
y = list()
for i in range(N):
    dat = np.zeros((k, time))
    for i in range(k):
        dat[i,:] = np.sin(list(range(time)))*np.random.normal(size =1) + np.random.normal(size = 1)
    x.append(dat)
    y.append(dat.T @ magic)

So I want to predict a timeseries of 100 steps from a set of 3 times steps. We want to the model to learn the magic.

from keras.models import Model
from keras.layers import Input, Conv1D, Dense, Lambda, LSTM
from keras.optimizers import Adam
from keras import backend as K
import matplotlib.pyplot as plt

input = Input(shape=(time, k))
lstm = LSTM(32, return_sequences=True)(input)
output = Dense(1,activation='sigmoid')(lstm)

model = Model(inputs = input, outputs = output)
model.compile(optimizer = Adam(), loss='mean_squared_error')

data_x = np.zeros((N,100,5))
data_y = np.zeros((N,100,1))

for i in range(N):
    data_x[i] = x[i].T.reshape(100,5)
    data_y[i] = y[i].reshape(100,1)

from sklearn.preprocessing import StandardScaler

ss_x = StandardScaler()
ss_y = StandardScaler()

data_x = ss_x.fit_transform(data_x.reshape(N,-1)).reshape(N,100,5)
data_y = ss_y.fit_transform(data_y.reshape(N,-1)).reshape(N,100,1)

# Lets leave the last one sample for testing rest split into train and validation
model.fit(data_x[:-1],data_y[:-1], batch_size=64, nb_epoch=100, validation_split=.25)

The val loss was going down still but I stoped it. Lets see how good our prediction is

y_hat = model.predict(data_x[-1].reshape(-1,100,5))
plt.plot(data_y[-1], label='y')
plt.plot(y_hat.reshape(100), label='y_hat')
plt.legend(loc='upper left')

The results are promising. Running it for more epochs and also hyper parameter tuning should further bring us close the the magic. One can also try stacked LSTM and bi-directional LSTM.

I feel RNNs are better suited for time series data rather then CNN's

Data Format: Lets say time steps = 3 Time series 1 = [1,2,3]

Time series 2 = [4,5,6]

Time series 3 = [7,8,9]

Time series 3 = [10,11,12]

Y = [100,200,300]

For a batch size of 1

[[1,4,7,10],[2,5,8,11],[3,6,9,12]] -> LSTM -> [100,200,300]