How to use Bidirectional RNN and Conv1D in keras w

2019-03-02 13:56发布

问题:

I am brand new to Deep-Learning so I'm reading though Deep Learning with Keras by Antonio Gulli and learning a lot. I want to start using some of the concepts. I want to try and implement a neural network with a 1-dimensional convolutional layer that feeds into a bidirectional recurrent layer (like the paper below). All the tutorials or code snippets I've encountered do not implement anything remotely similar to this (e.g. image recognition) or use an older version of keras with different functions and usage.


What I'm trying to do is a variation of this paper:

(1) convert DNA sequences to one-hot encoding vectors;

(2) use a 1 dimensional convolutional neural network;

(3) with max pooling;

(4) send the output to a bidirectional RNN;

(5) classify the input;


I cannot figure out how to get the shapes to match up on the Bidirectional RNN. I can't even get an ordinary RNN to work at this stage. How can I restructure the incoming layers to work with a Bidirectional RNN?

Note: The original code came from https://github.com/uci-cbcl/DanQ/blob/master/DanQ_train.py but I simplified the output layer to just do binary classification. This processed was described (kind of) in https://github.com/fchollet/keras/issues/3322 but I cannot get it to work with the updated keras. The original code (and the 2nd link) work on a very large dataset so I am generating some fake data to illustrate the concept. They are also using an older version of keras where key functionality changes have been made since then.

# Imports
import tensorflow as tf
import numpy as np
from tensorflow.python.keras._impl.keras.layers.core import *
from tensorflow.python.keras._impl.keras.layers import Conv1D, MaxPooling1D, SimpleRNN, Bidirectional, Input
from tensorflow.python.keras._impl.keras.models import Model, Sequential

# Set up TensorFlow backend
K = tf.keras.backend
K.set_session(tf.Session())
np.random.seed(0) # For keras?

# Constants
NUMBER_OF_POSITIONS = 40
NUMBER_OF_CLASSES = 2
NUMBER_OF_SAMPLES_IN_EACH_CLASS = 25

# Generate sequences
https://pastebin.com/GvfLQte2

# Build model
# ===========
# Input Layer
input_layer = Input(shape=(NUMBER_OF_POSITIONS,4))
# Hidden Layers
y = Conv1D(100, 10, strides=1, activation="relu", )(input_layer)
y = MaxPooling1D(pool_size=5, strides=5)(y)
y = Flatten()(y)
y = Bidirectional(SimpleRNN(100, return_sequences = True, activation="tanh", ))(y)
y = Flatten()(y)
y = Dense(100, activation='relu')(y)
# Output layer
output_layer = Dense(NUMBER_OF_CLASSES, activation="softmax")(y)

model = Model(input_layer, output_layer)
model.compile(optimizer="adam", loss="categorical_crossentropy", )
model.summary()


# ~/anaconda/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/layers/recurrent.py in build(self, input_shape)
#    1049     input_shape = tensor_shape.TensorShape(input_shape).as_list()
#    1050     batch_size = input_shape[0] if self.stateful else None
# -> 1051     self.input_dim = input_shape[2]
#    1052     self.input_spec[0] = InputSpec(shape=(batch_size, None, self.input_dim))
#    1053 

# IndexError: list index out of range

回答1:

You don't need to restructure anything at all to get the output of a Conv1D layer into an LSTM layer.

So, the problem is simply the presence of the Flatten layer, which destroys the shape.

These are the shapes used by Conv1D and LSTM:

  • Conv1D: (batch, length, channels)
  • LSTM: (batch, timeSteps, features)

Length is the same as timeSteps, and channels is the same as features.

Using the Bidirectional wrapper won't change a thing either. It will only duplicate your output features.


Classifying.

If you're going to classify the entire sequence as a whole, your last LSTM must use return_sequences=False. (Or you may use some flatten + dense instead after)

If you're going to classify each step of the sequence, all your LSTMs should have return_sequences=True. You should not flatten the data after them.