I had always heard that vectorized code runs faster than for loops in MATLAB. However, when I tried vectorizing my MATLAB code it seemed to run slower.
I used tic
and toc
to measure the times. I changed only the implementation of a single function in my program. My vectorized version ran in 47.228801
seconds and my for-loop version ran in 16.962089
seconds.
Also in my main program I used a large number for N, N = 1000000
and DataSet's size is 1 301
, and I ran each version several times for different data sets with the same size and N.
Why is the vectorized so much slower and how can I improve the speed further?
The "vectorized" version
function [RNGSet] = RNGAnal(N,DataSet)
%Creates a random number generated set of numbers to check accuracy overall
% This function will produce random numbers and normalize a new Data set
% that is derived from an old data set by multiply random numbers and
% then dividing by N/2
randData = randint(N,length(DataSet));
tempData = repmat(DataSet,N,1);
RNGSet = randData .* tempData;
RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N
end
The "for-loop" version
function [RNGData] = RNGAnsys(N,Data)
%RNGAnsys This function produces statistical RNG data using a for loop
% This function will produce RNGData that will be used to plot on another
% plot that possesses the actual data
multData = zeros(N,length(Data));
for i = 1:length(Data)
photAbs = randint(N,1); % Create N number of random 0's or 1's
multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
end
Vectorization
First glance at the for-loop code tells us that since
photAbs
is a binary array each column of which is scaled according to each element ofData
, this binary feature could be used for vectorization. This is abused in the code here -After profiling, it appears that the bottleneck is the random binary array creating part. So, using a faster random binary array creator as suggested in this smart solution, the above function could be further optimized like so -
Using the smart binary random array creator, the original code could be optimized as well, that will be used for a fair benchmarking between optimized for-loop and vectorized codes later on. The optimized for-loop code is listed here -
Benchmarking
Benchmarking Code
Results
Concluding remarks
Based on the experience I had so far with
MATLAB
, neither for loops nor vectorized techniques are fit for all situations, but everything is situation-specific.randData = randint(N,length(DataSet));
allocates a 1.2GB array. (4*301*1000000). Implicitly you create up to 4 of these monsters in your program, causing continuous cache-misses.
You for-loop code could nearly run in the processor cache (or it does on the bigger xeons).
Try using the matlab profiler to determine which line or lines of code are using the most amount of time. That way you can find out if the repmat function is what is slowing you down as is being suggested. Let us know what you find, I'm interested!