my question is the title.
Why is str2double so slow in matlab as compared to a mex function made in C/C++? Does matlab just not have good string handling capabilities?
Can anyone give me some factual reasons as to why a mex function runs so many orders of magnitude faster? I was hoping to do a running time analysis of this difference but I don't have any concrete reasons from the code for matlab.
Can you explain to me how I would open to file and actually look at the code written for the built in matlab str2double function?
some postings on the topic:
http://www.mathworks.com/matlabcentral/fileexchange/28893-fast-string-to-double-conversion
I don't understand what the poster means when they try to explain how this function runs more quickly. For instance, what does this mean: (str2doubleq is the mex function made in c++)
"str2doubleq exploits the mex-gateway to use c++ fast string handling capabilities and the std::stringstream properties. The conversion uses same ideas that is used in boost::lexical_cast"
No one can answer this?
1) str2double is a pretty complex function that can convert many different formats. I guess that your mex implementation is much simpler and that can explain why it's faster. Som examples are provided in matlab help:
str2double('123.45e7')
str2double('123 + 45i')
str2double('3.14159')
str2double('2.7i - 3.14')
str2double({'2.71' '3.1415'})
str2double('1,200.34')
2) why mex are faster? Because when you execute a standard m-file script, basically there is a program that will read your program an execute it (an interpreter). So there are two layers. However, when you write a mex-file, you compile it directly in the CPU language, such that it can be run directly by the processor. There is only one layer, so it's faster. For more details, see the wikipedia article:
http://en.wikipedia.org/wiki/Interpreted_language
http://en.wikipedia.org/wiki/Compiled_language
3) You cannot see the code of str2double because it is compiled. Matworks does not provide the code of this function. You can execute it, but not read it. This is the same for all built-in functions.
The implementation of str2double
is not hidden from you. To see it, type edit str2doulbe.m
. You can also run the profiler on your code to see where in the function all the time is being spent.
Looking at the function, my guess is that it is slow b/c sscanf
is being called inside a loop. One of the commenters in the fileexchange link you posted suggested using the following code to take advantage of sscanf
being vectorized:
d = reshape(sscanf(sprintf('%s#', c{:}), '%g#'), size(c));
This is actually much faster than str2double
for a cell array.