可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I was trying to use RNN (in particular, LSTM) for sequence prediction. Here, I was faced with some issues. For example:
sent_1 = "I am flying to Dubain"
sent_2 = "I was traveling from US to Dubai"
What I am trying to do here is predicting the next word after the previous one, as a simple RNN based on this Benchmark for building a PTB LSTM model.
But the num_steps
parameter (used for unrolling to the previous hidden states), should remain the same in each Tensorflow's epoch?. Basically, batching of sentences is not possible as the sentences in the batch vary among them in length.
# inputs = [tf.squeeze(input_, [1])
# for input_ in tf.split(1, num_steps, inputs)]
# outputs, states = rnn.rnn(cell, inputs, initial_state=self._initial_state)
Here, num_steps
need to be changed in my case for every sentence. I have tried several hack, but nothing seems working.
回答1:
You can use the ideas of bucketing and padding which are described in:
Sequence-to-Sequence Models
Also, the rnn function which creates RNN network accepts parameter sequence_length.
As an example, you can create buckets of sentences of the same size, pad them with the necessary amount of zeros, or placeholders which stand for zero word and afterwards feed them along with seq_length = len(zero_words).
seq_length = tf.placeholder(tf.int32)
outputs, states = rnn.rnn(cell, inputs, initial_state=initial_state, sequence_length=seq_length)
sess = tf.Session()
feed = {
seq_length: 20,
#other feeds
}
sess.run(outputs, feed_dict=feed)
Take a look at this reddit thread as well:
Tensorflow basic RNN example with 'variable length' sequences
回答2:
You can use dynamic_rnn
instead and specify length of every sequence even within one batch via passing array to sequence_length
parameter.
Example is below:
def length(sequence):
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
from tensorflow.nn.rnn_cell import GRUCell
max_length = 100
frame_size = 64
num_hidden = 200
sequence = tf.placeholder(tf.float32, [None, max_length, frame_size])
output, state = tf.nn.dynamic_rnn(
GRUCell(num_hidden),
sequence,
dtype=tf.float32,
sequence_length=length(sequence),
)
Code is taken from a perfect article on the topic, please also check it.
Update: Another great post on dynamic_rnn
vs rnn
you can find
回答3:
You can limit the maximum length of your input sequences, pad the shorter ones to that length, record the length of each sequence and use tf.nn.dynamic_rnn . It processes input sequences as usual, but after the last element of a sequence, indicated by seq_length
, it just copies the cell state through, and for output it outputs zeros-tensor.
回答4:
You can use ideas of bucketing and padding which are described in
Sequence-to-Sequence Models
Also rnn function which creates RNN network accepts parameter sequence_length.
As example you can create buckets of sentances of the same size, padd them with necessary amount of zeros, or placeholdres which stands for zero word and afterwards feed them along with seq_length = len(zero_words).
seq_length = tf.placeholder(tf.int32)
outputs, states = rnn.rnn(cell, inputs,initial_state=initial_state,sequence_length=seq_length)
sess = tf.Session()
feed = {
seq_lenght: 20,
#other feeds
}
sess.run(outputs, feed_dict=feed)
Here , the most important thing is , if you want to make use of the states obtained by one sentence as , the state for the next sentence , when you are providing sequence_length , ( lets say 20 and sentence after padding is 50 ) . You want the state obtained at the 20th time step . For that , do
tf.pack(states)
After that call
for i in range(len(sentences)):
state_mat = session.run([states],{
m.input_data: x,m.targets: y,m.initial_state: state, m.early_stop:early_stop })
state = state_mat[early_stop-1,:,:]
回答5:
Sorry to post on a dead issue but I just submitted a PR for a better solution. dynamic_rnn
is extremely flexible but abysmally slow. It works if it is your only option but CuDNN is much faster. This PR adds support for variable lengths to CuDNNLSTM
, so you will hopefully be able to use that soon.
You need to sort sequences by descending length. Then you can pack_sequence
, run your RNNs, then unpack_sequence
.
https://github.com/tensorflow/tensorflow/pull/22308