This is more a question to understand a behavior rather than a specific problem.
Mathworks states that numerical are stored continuous which makes preallocation important. This is not the case for cell arrays.
Are they something similar than vector or array of pointers in C++?
This would mean that prealocation is not so important since a pointer is half the size of a double (according to whos - but there surely is overhead somewhere to store the datatype of the mxArray).
Running this code:
clear all
n = 1e6;
tic
A = [];
for i=1:n
A(end + 1) = 1;
end
fprintf('Numerical without preallocation %f s\n',toc)
clear A
tic
A = zeros(1,n);
for i=1:n
A(i) = 1;
end
fprintf('Numerical with preallocation %f s\n',toc)
clear A
tic
A = cell(0);
for i=1:n
A{end + 1} = 1;
end
fprintf('Cell without preallocation %f s\n',toc)
tic
A = cell(1,n);
for i=1:n
A{i} = 1;
end
fprintf('Cell with preallocation %f s\n',toc)
returns: Numerical without preallocation 0.429240 s Numerical with preallocation 0.025236 s Cell without preallocation 4.960297 s Cell with preallocation 0.554257 s
There is no surprise for the numerical values. But the did surprise me since only the container of the pointers and not the data itself would need reallocation. Which should (since the pointer is smaller than a double) lead to difference of <.2s. Where does this overhead come from?
A related question would be, if I would like to make a data container for heterogeneous data in Matlab (preallocation is not possible since the final size is not known in the beginning). I think handle classes are not good since the also have huge overhead.
already looking forward to learn something
magu_
Edit: I tried out the linked list proposed by Eitan T but I think the overhead from matlab is still rather big. I tried something with an double array as data (rand(200000,1)).
I made a little plot to illustrate:
code for the graph: (I used the dlnode class from the matlab hompage as stated in the answering post)
D = rand(200000,1);
s = linspace(10,20000,50);
nC = zeros(50,1);
nL = zeros(50,1);
for i = 1:50
a = cell(0);
tic
for ii = 1:s(i)
a{end + 1} = D;
end
nC(i) = toc;
a = list([]);
tic
for ii = 1:s(i)
a.insertAfter(list(D));
end
nL(i) = toc;
end
figure
plot(s,nC,'r',s,nL,'g')
xlabel('#iter')
ylabel('time (s)')
legend({'cell' 'list'})
Don't get me wrong I love the idea of linked list, since there are rather flexible, but I think the overhead might be to big.