I currently have a (1631160,78)
np array as my input to a neural network. I would like to try something with LSTM which requires a 3D structure as input data. I'm currently using the following code to generate the 3D structure needed but it is super slow (ETA > 1day). Is there a better way to do this with numpy?
My current code to generate data:
def transform_for_rnn(input_x, input_y, window_size):
output_x = None
start_t = time.time()
for i in range(len(input_x)):
if i > 100 and i % 100 == 0:
sys.stdout.write('\rTransform Data: %d/%d\tETA:%s'%(i, len(input_x), str(datetime.timedelta(seconds=(time.time()-start_t)/i * (len(input_x) - i)))))
sys.stdout.flush()
if output_x is None:
output_x = np.array([input_x[i:i+window_size, :]])
else:
tmp = np.array([input_x[i:i+window_size, :]])
output_x = np.concatenate((output_x, tmp))
print
output_y = input_y[window_size:]
assert len(output_x) == len(output_y)
return output_x, output_y
Here's an approach using
NumPy strides
to vectorize the creation ofoutput_x
-Sample run -
This creates a view into the input array and as such memory-wise we are being efficient. In most cases, this should translate to benefits on performance too with further operations involving it. Let's verify that its a view indeed -
Another sure-shot way to verify would be to set some values into
output
and check the input -