I wanted to do some regular expressions in C++ so I looked on the interwebz (yes, I am an beginner/intermediate with C++) and found this SO answer.
I really don't know what to choose between boost::regex and boost::xpressive. What are the pros/cons?
I also read that boost::xpressive opposed to boost::regex is a header-only library. Is it hard to statically compile boost::regex on Linux and Windows (I almost always write cross-platform applications)?
I'm also interested in comparisons of compile time. I have a current implementation using boost::xpressive and I'm not too content with the compile times (but I have no comparisons to boost::regex).
Of course I'm open for other suggestions for regex implementations too. The requirements are free (as in beer) and compatible with http://nclabs.org/license.php.
Well if you need to create a regular expression at runtime (i.e. Letting the user type in a regular expression to search for) you can't use xpressive
as it is compile time only.
On the other hand, since it is a compile-time construct, it should benefit more from your optimizer than regex
does.
I do enough stuff with Boost.MPL, StateChart, and Spirit that 220KB of compiler warning and errors don't really bother me much. If that sounds like hell to you, stick with Boost.Regex.
If you do use xpressive, I highly recommend turning on -Wfatal-errors
as this will stop compilation (and further errors) after the first 'error:' line.
For compilation time, it's no contest. Boost.Regex will be faster*. The fact that xpressive uses MPL will cause compile times to be dramatically increased.
*This assumes you only build the dll/so once
One fairly important difference is that Boost Regex can support linking to ICU for Unicode support (character classes, etc) Boost Regex ICU Support.
As far as I can tell, Boost Xpressive doesn't have this kind of support built-in.
When using the Boost libraries I tend to lean toward the use of header only libraries, due to cross platform compatability issues. The down side of that is that when your compiler reports an error related to your use of the the library, the header only output tends toward the arcane.
Assuming you're using a reasonably recent compiler, there's a pretty decent chance that it includes a regex package already. Try just doing #include <regex>
and see if the compiler finds it.
The only trick to things is that it could be in either (or both) of two different namespaces. Regexes were included in TR1 of the C++ standard, and are also in (the final drafts of) C++11. The TR1 version is in a namespace named tr1
, where the standard version is in std
, just like the rest of the library.
FWIW, this is essentially the same as Boost regex, not Boost Xpressive.
I would try to supplement other people answers by get deeper into topic of compile-time regular expressions(CTR) vs run-time(dynamic) regular expressions(RTR) in a more theoretical way(this topic is implied by OP question indirectly IMHO). Run-time regex are more known and popular(most language core-libraries implementations), i suppose due to historical reasons. They are OK when regular expression is determined at run-time, unlike CTR. Both work on finite state machine basis.
RTR are "compiled" and interpreted by some kind of universal finite state machine(universal means its kind of interpreter which scheme is given at run-time, "compiled" in some internal data structure - when you pass regex string, then interpreted at run-time).
But CTR is "compiled" at compile-time and are specific for particular regex, so you can't use them, when regex is given at run-time(applications like text editors, file/internet search engines).
But they are a priori more efficient(theoretically however) as customized in compile-time finite state machine will be efficient, than interpreter with table-preset scheme of this machine(some similar cases are reflection field access vs compile-time access, or specialized function optimized for some fixed parameter as pointed out there). Another advantage is compile-time syntax checking. CTR can be implemented through meta-programming and/or code generation.
As for specific implementations - there are many RTR, but not so numerous CTR. For C++ they are above mentioned Boost and STL C++0x11 implementations. You may need them for optimizing regex perfomance/size of generated code/memory usage, mostly relevant for embedded systems or high perfomance specific applications.
SO question about CTR
Finding CTR-implementations is harder, one example if found is Re2C Code generator project, Java CTR implementation and C# implementation featuring run-time compilation(into IL code, not internal data structure) of Regex [there is SO question about it]
P.S. Sorry, couldn't post some relevant links due to reputation