I'm currently working on Single Image Superresolution and I've managed to freeze an existing checkpoint file and convert it into tensorflow lite. However, when performing inference using the .tflite file, the time taken to upsample one image is at least 4 times that when restoring the model using the .ckpt file.
Inference using the .ckpt file is done using session.run(), while inference using the .tflite file is done using interpreter.invoke(). Both operations were done on a Ubuntu 18 VM running on a typical PC.
What I did to find out more about the issue is to run top
in a seperate terminal to see the CPU utilization rate when either operations are performed. Utilization rate hits 270% with the .ckpt file, but stays at around 100% with the .tflite file.
interpreter.set_tensor(input_details[0]['index'], input_image_reshaped)
interpreter.set_tensor(input_details[1]['index'], input_bicubic_image_reshaped)
start = time.time()
interpreter.invoke()
end = time.time()
vs
y = self.sess.run(self.y_, feed_dict={self.x: image.reshape(1, image.shape[0], image.shape[1], ch), self.x2: bicubic_image.reshape(1, self.scale * image.shape[0], self.scale * image.shape[1], ch), self.dropout: 1.0, self.is_training: 0})
One hypothesis is that tensorflow lite is not configured for multithreading, and another is that tensorflow lite is optimized for ARM processors (rather than an Intel one that my computer runs on) and thus it is slower. However, I cannot tell for sure and neither do I know how to trace the root of the issue - hopefully someone out there will be more knowledgeable about this?