Windows XP SP3. Core 2 Duo 2.0 GHz. I'm finding the boost::lexical_cast performance to be extremely slow. Wanted to find out ways to speed up the code. Using /O2 optimizations on visual c++ 2008 and comparing with java 1.6 and python 2.6.2 I see the following results.
Integer casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(i);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
s = new Integer(i).toString();
}
python:
for i in xrange(1,10000000):
s = str(i)
The times I'm seeing are
c++: 6700 milliseconds
java: 1178 milliseconds
python: 6702 milliseconds
c++ is as slow as python and 6 times slower than java.
Double casting:
c++:
std::string s ;
for(int i = 0; i < 10000000; ++i)
{
s = boost::lexical_cast<string>(d);
}
java:
String s = new String();
for(int i = 0; i < 10000000; ++i)
{
double d = i*1.0;
s = new Double(d).toString();
}
python:
for i in xrange(1,10000000):
d = i*1.0
s = str(d)
The times I'm seeing are
c++: 56129 milliseconds
java: 2852 milliseconds
python: 30780 milliseconds
So for doubles c++ is actually half the speed of python and 20 times slower than the java solution!!. Any ideas on improving the boost::lexical_cast performance? Does this stem from the poor stringstream implementation or can we expect a general 10x decrease in performance from using the boost libraries.
Unfortunately I don't have enough rep yet to comment...
lexical_cast
is not primarily slow because it's generic (template lookups happen at compile-time, so virtual function calls or other lookups/dereferences aren't necessary).lexical_cast
is, in my opinion, slow, because it builds on C++ iostreams, which are primarily intended for streaming operations and not single conversions, and becauselexical_cast
must check for and convert iostream error signals. Thus:sprintf
does, thoughsprintf
won't safely handle buffer overruns)lexical_cast
has to check forstringstream
errors (ss.fail()
) in order to throw exceptions on conversion failureslexical_cast
is nice because (IMO) exceptions allow trapping all errors without extra effort and because it has a uniform prototype. I don't personally see why either of these properties necessitate slow operation (when no conversion errors occur), though I don't know of such C++ functions which are fast (possibly Spirit or boost::xpressive?).Edit: I just found a message mentioning the use of
BOOST_LEXICAL_CAST_ASSUME_C_LOCALE
to enable an "itoa" optimisation: http://old.nabble.com/lexical_cast-optimization-td20817583.html. There's also a linked article with a bit more detail.lexical_cast
is more general than the specific code you're using in Java and Python. It's not surprising that a general approach that works in many scenarios (lexical cast is little more than streaming out then back in to and from a temporary stream) ends up being slower than specific routines.(BTW, you may get better performance out of Java using the static version,
Integer.toString(int)
. [1])Finally, string parsing and deparsing is usually not that performance-sensitive, unless one is writing a compiler, in which case
lexical_cast
is probably too general-purpose, and integers etc. will be calculated as each digit is scanned.[1] Commenter "stepancheg" doubted my hint that the static version may give better performance. Here's the source I used:
The runtimes, using JDK 1.6.0-14, server VM:
And in client VM:
Even though theoretically, escape analysis may permit allocation on the stack, and inlining may introduce all code (including copying) into the local method, permitting elimination of redundant copying, such analysis may take quite a lot of time and result in quite a bit of code space, which has other costs in code cache that don't justify themselves in real code, as opposed to microbenchmarks like seen here.
lexical_cast may or may not be as slow in relation to Java and Python as your bencharks indicate because your benchmark measurements may have a subtle problem. Any workspace allocations/deallocations done by lexical cast or the iostream methods it uses are measured by your benchmarks because C++ doesn't defer these operations. However, in the case of Java and Python, the associated deallocations may in fact have simply been deferred to a future garbage collection cycle and missed by the benchmark measurements. (Unless a GC cycle by chance occurs while the benchmark is in progress and in that case you'd be measuring too much). So it's hard to know for sure without examining specifics of the Java and Python implementations how much "cost" should be attributed to the deferred GC burden that may (or may not) be eventually imposed.
This kind of issue obviously may apply to many other C++ vs garbage collected language benchmarks.
What lexical cast is doing in your code can be simplified to this:
There is unfortunately a lot going on every time you call Cast():
Thn in your own code:
the assignment involves further allocations and deallocations are performed. You may be able to reduce this slightly by using:
instead.
However, if performance is really importanrt to you, you should considerv using a different mechanism. You could write your own version of Cast() which (for example) creates a static stringstream. Such a version would not be thread safe, but that might not matter for your specific needs.
To summarise, lexical_cast is a convenient and useful feature, but such convenience comes (as it always must) with trade-offs in other areas.
Edit 2012-04-11
rve quite rightly commented about lexical_cast's performance, providing a link:
http://www.boost.org/doc/libs/1_49_0/doc/html/boost_lexical_cast/performance.html
I don't have access right now to boost 1.49, but I do remember making my code faster on an older version. So I guess:
Original answer
Just to add info on Barry's and Motti's excellent answers:
Some background
Please remember Boost is written by the best C++ developers on this planet, and reviewed by the same best developers. If
lexical_cast
was so wrong, someone would have hacked the library either with criticism or with code.I guess you missed the point of
lexical_cast
's real value...Comparing apples and oranges.
In Java, you are casting an integer into a Java String. You'll note I'm not talking about an array of characters, or a user defined string. You'll note, too, I'm not talking about your user-defined integer. I'm talking about strict Java Integer and strict Java String.
In Python, you are more or less doing the same.
As said by other posts, you are, in essence, using the Java and Python equivalents of
sprintf
(or the less standarditoa
).In C++, you are using a very powerful cast. Not powerful in the sense of raw speed performance (if you want speed, perhaps
sprintf
would be better suited), but powerful in the sense of extensibility.Comparing apples.
If you want to compare a Java
Integer.toString
method, then you should compare it with either Csprintf
or C++ostream
facilities.The C++ stream solution would be 6 times faster (on my g++) than
lexical_cast
, and quite less extensible:The C
sprintf
solution would be 8 times faster (on my g++) thanlexical_cast
but a lot less safe:Both solutions are either as fast or faster than your Java solution (according to your data).
Comparing oranges.
If you want to compare a C++
lexical_cast
, then you should compare it with this Java pseudo code:Source and Target being of whatever type you want, including built-in types like
boolean
orint
, which is possible in C++ because of templates.Extensibility? Is that a dirty word?
No, but it has a well known cost: When written by the same coder, general solutions to specific problems are usually slower than specific solutions written for their specific problems.
In the current case, in a naive viewpoint,
lexical_cast
will use the stream facilities to convert from a typeA
into a string stream, and then from this string stream into a typeB
.This means that as long as your object can be output into a stream, and input from a stream, you'll be able to use
lexical_cast
on it, without touching any single line of code.So, what are the uses of
lexical_cast
?The main uses of lexical casting are:
The point 2 is very very important here, because it means we have one and only one interface/function to cast a value of a type into an equal or similar value of another type.
This is the real point you missed, and this is the point that costs in performance terms.
But it's so slooooooowwww!
If you want raw speed performance, remember you're dealing with C++, and that you have a lot of facilities to handle conversion efficiently, and still, keep the
lexical_cast
ease-of-use feature.It took me some minutes to look at the lexical_cast source, and come with a viable solution. Add to your C++ code the following code:
By enabling this specialization of lexical_cast for strings and ints (by defining the macro
SPECIALIZE_BOOST_LEXICAL_CAST_FOR_STRING_AND_INT
), my code went 5 time faster on my g++ compiler, which means, according to your data, its performance should be similar to Java's.And it took me 10 minutes of looking at boost code, and write a remotely efficient and correct 32-bit version. And with some work, it could probably go faster and safer (if we had direct write access to the
std::string
internal buffer, we could avoid a temporary external buffer, for example).if speed is a concern, or you are just interested in how fast such casts can be with C++, there's an interested thread regarding it.
Boost.Spirit 2.1(which is to be released with Boost 1.40) seems to be very fast, even faster than the C equivalents(strtol(), atoi() etc. ).