Summary: This question deals with the improvement of an algorithm for the computation of linear regression.
I have a 3D (dlMAT
) array representing monochrome photographs of the same scene taken at different exposure times (the vector IT
) . Mathematically, every vector along the 3rd dimension of dlMAT
represents a separate linear regression problem that needs to be solved. The equation whose coefficients need to be estimated is of the form:
DL = R*IT^P
, whereDL
andIT
are obtained experimentally andR
andP
must be estimated.
The above equation can be transformed into a simple linear model after applying a logarithm:
log(DL) = log(R) + P*log(IT) => y = a + b*x
Presented below is the most "naive" way to solve this system of equations, which essentially involves iterating over all "3rd dimension vectors" and fitting a polynomial of order 1
to (IT,DL(ind1,ind2,:)
:
%// Define some nominal values:
R = 0.3;
IT = 600:600:3000;
P = 0.97;
%// Impose some believable spatial variations:
pMAT = 0.01*randn(3)+P;
rMAT = 0.1*randn(3)+R;
%// Generate "fake" observation data:
dlMAT = bsxfun(@times,rMAT,bsxfun(@power,permute(IT,[3,1,2]),pMAT));
%// Regression:
sol = cell(size(rMAT)); %// preallocation
for ind1 = 1:size(dlMAT,1)
for ind2 = 1:size(dlMAT,2)
sol{ind1,ind2} = polyfit(log(IT(:)),log(squeeze(dlMAT(ind1,ind2,:))),1);
end
end
fittedP = cellfun(@(x)x(1),sol); %// Estimate of pMAT
fittedR = cellfun(@(x)exp(x(2)),sol); %// Estimate of rMAT
The above approach seems like a good candidate for vectorization, since it does not utilize MATLAB's main strength that is MATrix operations. For this reason, it does not scale very well and takes much longer to execute than I think it should.
There exist alternative ways to perform this computation based on matrix division, as demonstrated here and here, which involve something like this:
sol = [ones(size(x)),log(x)]\log(y);
That is, appending a vector of 1
s to the observations, followed by mldivide
to solve the equation system.
The main challenge I'm facing is how to adapt my data to the algorithm (or vice versa).
Question #1: How can the matrix-division-based solution be extended to solve the problem presented above (and potentially replace the loops I am using)?
Question #2 (bonus): What is the principle behind this matrix-division-based solution?
The secret ingredient behind the solution that includes matrix division is the Vandermonde matrix. The question discusses a linear problem (linear regression), and those can always be formulated as a matrix problem, which
\
(mldivide
) can solve in a mean-square error sense‡. Such an algorithm, solving a similar problem, is demonstrated and explained in this answer.Below is benchmarking code that compares the original solution with two alternatives suggested in chat1, 2 :
The reason why method 2 can be vectorized into method 3 is essentially that matrix multiplication can be separated by the columns of the second matrix. If
A*B
produces matrixX
, then by definitionA*B(:,n)
givesX(:,n)
for anyn
. MovingA
to the right-hand side withmldivide
, this means that the divisionsA\X(:,n)
can be done in one go for alln
withA\X
. The same holds for an overdetermined system (linear regression problem), in which there is no exact solution in general, andmldivide
finds the matrix that minimizes the mean-square error. In this case too, the operationsA\X(:,n)
(method 2) can be done in one go for alln
withA\X
(method 3).The implications of improving the algorithm when increasing the size of
dlMAT
can be seen below:For the case of
500*500
(or2.5E5
) elements, the speedup fromMethod 1
toMethod 3
is about x3500!It is also interesting to observe the output of
profile
(here, for the case of 500*500):Method 1
Method 2
Method 3
From the above it is seen that rearranging the elements via
squeeze
andflipud
takes up about half (!) of the runtime ofMethod 2
. It is also seen that some time is lost on the conversion of the solution from cells to matrices.Since the 3rd solution avoids all of these pitfalls, as well as the loops altogether (which mostly means re-evaluation of the script on every iteration) - it unsurprisingly results in a considerable speedup.
Notes:
Method 3
in favor of the "explicit" version. For this reason it was not included in the comparison.Method 3
weregpuArray
-ed. This did not provide improved performance (and even somewhat degradaed them), possibly due to wrong implementation, or the overhead associated with copying matrices back and forth between RAM and VRAM.