LSTM - predicting on a sliding window data

My training data is an overlapping sliding window of users daily data. it's shape is (1470, 3, 256, 18):
1470 batches of 3 days of data, each day has 256 samples of 18 features each.

My targets shape is (1470,): a label value for each batch.

I want to train an LSTM to predict a [3 days batch] -> [one target]
The 256 day samples is padded with -10 for days that were missing 256 sampels

I've written the following code to build the model:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout,Dense,Masking,Flatten
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.callbacks import TensorBoard,ModelCheckpoint
from tensorflow.keras import metrics

def build_model(num_samples, num_features):

  opt = RMSprop(0.001) 

  model = Sequential()
  model.add(Masking(mask_value=-10., input_shape=(num_samples, num_features)))
  model.add(LSTM(32, return_sequences=True, activation='tanh'))
  model.add(Dropout(0.3))
  model.add(LSTM(16, return_sequences=False, activation='tanh'))
  model.add(Dropout(0.3))
  model.add(Dense(16, activation='tanh'))
  model.add(Dense(8, activation='tanh'))
  model.add(Dense(1))
  model.compile(loss='mse', optimizer=opt ,metrics=['mae','mse'])
  return model

model = build_model(256,18)
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
masking_7 (Masking)          (None, 256, 18)           0         
_________________________________________________________________
lstm_14 (LSTM)               (None, 256, 32)           6528      
_________________________________________________________________
dropout_7 (Dropout)          (None, 256, 32)           0         
_________________________________________________________________
lstm_15 (LSTM)               (None, 16)                3136      
_________________________________________________________________
dropout_8 (Dropout)          (None, 16)                0         
_________________________________________________________________
dense_6 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_7 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 9         
=================================================================
Total params: 10,081
Trainable params: 10,081
Non-trainable params: 0
_________________________________________________________________

I can see that the shapes are incompatible, but I can't figure out how to change the code to fit my problem.

Any help would be appreciated

Update: I've reshaped my data like so:

train_data.reshape(1470*3, 256, 18)

is that right?

I think you are looking for TimeDistributed(LSTM(...)) (source)

day, num_samples, num_features = 3, 256, 18

model = Sequential()
model.add(Masking(mask_value=-10., input_shape=(day, num_samples, num_features)))
model.add(TimeDistributed(LSTM(32, return_sequences=True, activation='tanh')))
model.add(Dropout(0.3))
model.add(TimeDistributed(LSTM(16, return_sequences=False, activation='tanh')))
model.add(Dropout(0.3))
model.add(Dense(16, activation='tanh'))
model.add(Dense(8, activation='tanh'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam' ,metrics=['mae','mse'])

model.summary()