convert fake quantized tensorflow model(.pb) to te

I tried to follow instructions in tensorflow quantization to generate a quantized tensorflow lite model.

First, I use tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph() in my training process to insert fake quantization node into the graph, and generate a frozen pb file(model.pb) finally.

Second, I use following command to convert my fake quantized tensorflow model to quantized tensorflow lite model.

bazel-bin/tensorflow/contrib/lite/toco/toco \
--input_file=model.pb \
--input_format=TENSORFLOW_GRAPHDEF \
--output_format=TFLITE \
--output_file=model.tflite \
--inference_type=QUANTIZED_UINT8 --input_shapes=1,1:1,5002 \
--input_arrays=Test/Model/input,Test/Model/apps \
--output_arrays=Test/Model/output_probs,Test/Model/final_state  \
--mean_values=127.5,127.5 --std_values=127.5,127.5 --allow_custom_ops

The covert process failed with following logs:

2018-03-28 18:00:38.348403: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before Removing unused ops: 118 operators, 193 arrays (0 quantized)
2018-03-28 18:00:38.349394: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before general graph transformations: 118 operators, 193 arrays (0 quantized)
2018-03-28 18:00:38.382854: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] After general graph transformations pass 1: 57 operators, 103 arrays (1 quantized)
2018-03-28 18:00:38.384327: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] After general graph transformations pass 2: 56 operators, 101 arrays (1 quantized)
2018-03-28 18:00:38.385235: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] After general graph transformations pass 3: 55 operators, 100 arrays (1 quantized)
2018-03-28 18:00:38.385995: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before pre-quantization graph transformations: 55 operators, 100 arrays (1 quantized)
2018-03-28 18:00:38.386047: W tensorflow/contrib/lite/toco/graph_transformations/hardcode_min_max.cc:131] Skipping min-max setting for {TensorFlowSplit operator with output Test/Model/RNN/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/split} because output Test/Model/RNN/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/split already has min-max.
2018-03-28 18:00:38.386076: W tensorflow/contrib/lite/toco/graph_transformations/hardcode_min_max.cc:131] Skipping min-max setting for {TensorFlowSplit operator with output Test/Model/RNN/RNN/multi_rnn_cell/cell_1/basic_lstm_cell/split} because output Test/Model/RNN/RNN/multi_rnn_cell/cell_1/basic_lstm_cell/split already has min-max.
2018-03-28 18:00:38.386328: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] After pre-quantization graph transformations pass 1: 48 operators, 93 arrays (1 quantized)
2018-03-28 18:00:38.386484: W tensorflow/contrib/lite/toco/graph_transformations/hardcode_min_max.cc:131] Skipping min-max setting for {TensorFlowSplit operator with output Test/Model/RNN/RNN/multi_rnn_cell/cell_1/basic_lstm_cell/split} because output Test/Model/RNN/RNN/multi_rnn_cell/cell_1/basic_lstm_cell/split already has min-max.
2018-03-28 18:00:38.386502: W tensorflow/contrib/lite/toco/graph_transformations/hardcode_min_max.cc:131] Skipping min-max setting for {TensorFlowSplit operator with output Test/Model/RNN/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/split} because output Test/Model/RNN/RNN/multi_rnn_cell/cell_0/basic_lstm_cell/split already has min-max.
2018-03-28 18:00:38.386778: F tensorflow/contrib/lite/toco/tooling_util.cc:1432] Array Test/Model/embedding_lookup, which is an input to the TensorFlowReshape operator producing the output array Test/Model/Reshape_1, is lacking min/max data, which is necessary for quantization. Either target a non-quantized output format, or change the input graph to contain min/max information, or pass --default_ranges_min= and --default_ranges_max= if you do not care about the accuracy of results.
Aborted

What is the problem and where am I wrong?

You aren't doing anything wrong.

Currently create_training_graph and create_eval_graph aren't the most robust over various model architectures. We have them working on most CNNs but RNNs are still in progress and pose a different set of challenges.

Depending on the details of the RNN, the approach for quantization right now will be more involved and may require manually putting FakeQuantization operations in the correct places. Specifically in your error message it seems that you need to add a FakeQuantization operation at the embedding_lookup. That being said, the final quantized RNN may run, but I have no idea how the accuracy will look. It really ends up being model and dataset dependent :)

I will update this answer when RNNs are properly supported by the automatic rewrites.