I want to convert a std::string
to lowercase. I am aware of the function tolower()
, however in the past I have had issues with this function and it is hardly ideal anyway as use with a std::string
would require iterating over each character.
Is there an alternative which works 100% of the time?
There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.
Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.
Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.
The code looks like this...
This approach will, at the same time, allow you to remap any other characters you wish to change.
This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.
Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.
On microsoft platforms you can use the
strlwr
family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspxIf the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html
I tried std::transform, all i get is abominable stl criptic compilation error that only druids from 200 years ago can understand (cannot convert from to flibidi flabidi flu)
this works fine and can be easily tweaked
As far as I see Boost libraries are really bad performance-wise. I have tested their unordered_map to STL and it was average 3 times slower (best case 2, worst was 10 times). Also this algorithm looks too low.
The difference is so big that I am sure whatever addition you will need to do to
tolower
to make it equal to boost "for your needs" will be way faster than boost.I have done these tests on an Amazon EC2, therefore performance varied during the test but you still get the idea.
-O2
made it like this:Source:
I guess I should to the tests on a dedicated machine but I will be using this EC2 so I do not really need to test it on my machine.
std::ctype::tolower()
from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page