Note: I've posted this also on Eigen forum here
I want to premultiply 3xN matrices by a 3x3 matrix, i.e., to transform 3D points, like p_dest = T * p_source
after initializing the matrices:
Eigen::Matrix<double, 3, Eigen::Dynamic> points = Eigen::Matrix<double, 3, Eigen::Dynamic>::Random(3, NUMCOLS);
Eigen::Matrix<double, 3, Eigen::Dynamic> dest = Eigen::Matrix<double, 3, Eigen::Dynamic>(3, NUMCOLS);
int NT = 100;
I have evaluated this two versions
// eigen direct multiplication
for (int i = 0; i < NT; i++){
Eigen::Matrix3d T = Eigen::Matrix3d::Random();
dest.noalias() = T * points;
}
and
// col multiplication
for (int i = 0; i < NT; i++){
Eigen::Matrix3d T = Eigen::Matrix3d::Random();
for (int c = 0; c < points.cols(); c++){
dest.col(c) = T * points.col(c);
}
}
the NT repetition are done just to compute average time
I am surprised the the column by column multiplication is about 4/5 time faster than the direct multiplication
(and the direct multiplication is even slower if I do not use the .noalias()
, but this is fine since it is doing a temporary copy)
I've tried to change NUMCOLS from 0 to 1000000 and the relation is linear.
I'm using Visual Studio 2013 and compiling in release
The next figure shows on X the number of columns of the matrix and in Y the avg time for a single operation, in blue the col by col multiplication, in red the matrix multiplication
Any suggestion why this happens?
Short answer
You're timing the lazy (and therefore lack of) evaluation in the col multiplication version, vs. the lazy (but evaluated) evaluation in the direct version.
Long answer
Instead of code snippets, let's look at a full MCVE. First, "you're" version:
With this code (and SSE turned on), I get:
Same 4-5 slowdown you complained of. Why?!?! Before we get to the answer, let's modify the code a bit so that the
dest
matrix is sent to anostream
. Addstd::ostream outPut(0);
to the beginning ofmain()
and before ending the timers addoutPut << dest << "\n\n";
andoutPut << dest2 << "\n\n";
. Thestd::ostream outPut(0);
doesn't output anything (I'm pretty sure the badbit is set), but it does cause Eigensoperator<<
to be called, which forces the evaluation of the matrix.NOTE: if we used
outPut << dest(1,1)
thendest
would be evaluated only enough to output the single element in the col multiplication method.We then get
as a result as expected. Note that the Eigen direct method took the exact(ish) same time (meaning the evaluation took place even without the added
ostream
), whereas the col method all of the sudden took much longer.