TL;DR, I want to know how to use a bi-lstm-ctc tensorflow model in an android application.
I have succeeded in training my bi-lstm-ctc tensorflow model and now I want to use it for my handwriting recognition android application. Here's the part of the code that defines the graph I used:
self.inputs = tf.placeholder(tf.float32, [None, None, network_config.num_features], name="input")
self.labels = tf.sparse_placeholder(tf.int32, name="label")
self.seq_len = tf.placeholder(tf.int32, [None], name="seq_len_input")
logits = self._bidirectional_lstm_layers(
network_config.num_hidden_units,
network_config.num_layers,
network_config.num_classes
)
self.global_step = tf.Variable(0, trainable=False)
self.loss = tf.nn.ctc_loss(labels=self.labels, inputs=logits, sequence_length=self.seq_len)
self.cost = tf.reduce_mean(self.loss)
self.optimizer = tf.train.AdamOptimizer(network_config.learning_rate).minimize(self.cost)
self.decoded, self.log_prob = tf.nn.ctc_beam_search_decoder(inputs=logits, sequence_length=self.seq_len, merge_repeated=False)
self.dense_decoded = tf.sparse_tensor_to_dense(self.decoded[0], default_value=-1, name="output")
I also succeeded in freezing and optimizing the graph following the freeze and optimize graph codes in this tutorial. Here's the part of the code that is supposed to run the model:
bitmap = Bitmap.createScaledBitmap(bitmap, 1024, 128, true);
int[] intValues = new int[bitmap.getWidth() * bitmap.getHeight()];
bitmap.getPixels(intValues, 0, bitmap.getWidth(), 0, 0, bitmap.getWidth(), bitmap.getHeight());
float[] floatValues = new float[bitmap.getWidth() * bitmap.getHeight()];
for (int i = 0; i < intValues.length; ++i) {
final int val = intValues[i];
floatValues[i] = (((val >> 16) & 0xFF));
}
float[] result = new float[80];
long[] INPUT_SIZE = new long[]{1, bitmap.getHeight(), bitmap.getWidth()};
inferenceInterface.feed(config.getInputName(), floatValues, INPUT_SIZE);
inferenceInterface.feed("seq_len_input", new int[]{bitmap.getWidth()}, 1);
inferenceInterface.run(config.getOutputNames());
inferenceInterface.fetch(config.getOutputNames()[0], result);
return result.toString();
However, I encounter these problems depending on the model I use. If I use the frozen graph, I encounter this error:
Caused by: java.lang.IllegalArgumentException: No OpKernel was registered to support
Op 'SparseToDense' with these attrs. Registered devices: [CPU], Registered kernels:
device='CPU'; T in [DT_STRING]; Tindices in [DT_INT64]
device='CPU'; T in [DT_STRING]; Tindices in [DT_INT32]
device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT64]
device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT32]
device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT64]
device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT32]
device='CPU'; T in [DT_INT32]; Tindices in [DT_INT64]
device='CPU'; T in [DT_INT32]; Tindices in [DT_INT32]
[[Node: output = SparseToDense[T=DT_INT64, Tindices=DT_INT64, validate_indices=true](CTCBeamSearchDecoder, CTCBeamSearchDecoder:2, CTCBeamSearchDecoder:1, output/default_value)]]
If I use the optimized frozen graph, I encounter this error:
java.io.IOException: Not a valid TensorFlow Graph serialization: NodeDef expected inputs '' do not match 1 inputs
specified; Op<name=Const; signature= -> output:dtype; attr=value:tensor; attr=dtype:type>;
NodeDef: stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/while/add/y = Const[dtype=DT_INT32,
value=Tensor<type: int32 shape: [] values: 1>](stack_bidirectional_rnn/cell_0/bidirectional_rnn/bw/bw/while/Switch:1)
Aside from the way to address these errors, I have other questions/clarifications:
How do I address these errors?
I already made it work. The solution can also be found in this github issue.
Apparently, the issue was the types used. I was passing int64 where only int32 is accepted.
To address that issue, I casted the sparse tensor elements to int32:
Running the application after that gave me this error:
For some weird reason, changing the image width from 1024 to 128 in the java code fixed that error. Running the app again gave me this error:
The problem arised upon fetching the output. With that, I knew the model ran successfully but the application just couldn't fetch results.
Silly ol' me forgot that outputs is an integer array, not a float array. So, I changed the type of the result array to an int array:
Thus making the application work. The accuracy of the model is not good since it wasn't trained properly. I was just trying to get it to work in the application. It's time for some serious training!