I've read this but I still don't understand why vectorized code is faster.
In for loops, I can use parfor to for parallel computation. If vectorized code is faster, does it means that it is automatically parallelized?
I've read this but I still don't understand why vectorized code is faster.
In for loops, I can use parfor to for parallel computation. If vectorized code is faster, does it means that it is automatically parallelized?
No. You're mixing two important concepts:
Consider for example a trivial case such as the following:
s=0;
for i=1:length(v),
s = s+v(i);
end
and
sum(v)
you should probably use tic and toc to time these two functions to convince yourself of the difference in runtime. There are about 10 similar commonly used functions that operate on vectors, examples are: bsxfun
, repmat
, length
, find
. Vectorization is a standard part of using MATLAB effectively. Until you can vectorize code effectively you're just a tourist in the world of MATLAB not a citizen.
While in many cases parfor can help a lot the type of loops that can be parfored for very large gains occur seldomly.
I agree with carlosdc on his answer. However, it is important to remember that Matlab since release 6.5 has included a JIT compiler for speeding up for-loops and the like.
I made a quick test of your sum example with a million elements in v
and got the following results:
sum(v)
: 4.3 ms for-loop version
: 16 msfor-loop version, no JIT
: 966 msThe JIT can be turned on and off like this:
feature accel off
feature accel on
A factor 4 in improvement by vectorizing code is of course still often worth it, but the for-loops shouldn't be feared as they once were for problems where they are otherwise a good solution. Often though, a piece of well vectorized code can often be simpler, less error prone and faster at the same time.
In modern computers, the registers (temporary memory used for math, among other uses) have many bits and can manipulate multiple numbers together. For example if your data is uint8 (8 bits), you can add a number to each one in one CPU-clock, or you can put 8 of them together in the register and and a number to all of them in one CPU-clock. This way you work 8 times faster than for-loop.
This is in a sense parallelization, but not like parfor. Parfor uses multiple cores of your CPU, and in the above method one core is used more efficiently. If you use them both, you can achieve even higher speeds.