My goal is to run a TensorFlow model in real time to control a vehicle from a learned model. Our vehicle system uses ROS (Robot Operating System) which is tied closely to OpenCV. So, I receive an OpenCV Mat containing the image of interest from ROS.
cv::Mat cameraImg;
I would like to create a Tensorflow Tensor directly from the data in this OpenCV matrix to avoid the expense of copying the matrix line-by-line. Using the answer to This Question I have managed to get the forward pass of the network working with the following code:
cameraImg.convertTo(cameraImg, CV_32FC3);
Tensor inputImg(DT_FLOAT, TensorShape({1,inputheight,inputwidth,3}));
auto inputImageMapped = inputImg.tensor<float, 4>();
auto start = std::chrono::system_clock::now();
//Copy all the data over
for (int y = 0; y < inputheight; ++y) {
const float* source_row = ((float*)cameraImg.data) + (y * inputwidth * 3);
for (int x = 0; x < inputwidth; ++x) {
const float* source_pixel = source_row + (x * 3);
inputImageMapped(0, y, x, 0) = source_pixel[2];
inputImageMapped(0, y, x, 1) = source_pixel[1];
inputImageMapped(0, y, x, 2) = source_pixel[0];
}
}
auto end = std::chrono::system_clock::now();
However, using this method the copy to the tensor takes between 80ms and 130ms, while the entire forward pass (for a 10-layer convolutional network) only takes 25ms.
Looking at the tensorflow documentation, it appears there is a Tensor constructor that takes an allocator. However, I have not been able to find any Tensorflow or Eigen documentation relating to this functionality or the Eigen Map class as it relates to Tensors.
Does anyone have any insight into how this code can be sped up, ideally by re-using my OpenCV memory?
EDIT: I have successfully implemented what @mrry suggested, and can re-use the memory allocated by OpenCV. I have opened github issue 8033 requesting this be added to the tensorflow source tree. My method isn't that pretty, but it works.
It is still very difficult to compile an external library and link it to the libtensorflow.so library. Potentially the tensorflow cmake library will help with this, I have not yet tried it.