I want to convert a std::string
to lowercase. I am aware of the function tolower()
, however in the past I have had issues with this function and it is hardly ideal anyway as use with a std::string
would require iterating over each character.
Is there an alternative which works 100% of the time?
An alternative to Boost is POCO (pocoproject.org).
POCO provides two variants:
"In Place" versions always have "InPlace" in the name.
Both versions are demonstrated below:
Another approach using range based for loop with reference variable
This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling
std::transform
. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.Simplest way to convert string into loweercase without bothering about std namespace is as follows
1:string with/without spaces
2:string without spaces
Copy because it was disallowed to improve answer. Thanks SO
Explanation:
for(auto& c : test)
is a range-based for loop of the kindfor (
range_declaration
:
range_expression
)
loop_statement
:range_declaration
:auto& c
Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.
range_expression
:test
The range in this case are the characters of string
test
.The characters of the string
test
are available as a reference inside the for loop through identifierc
.tl;dr
Use the ICU library.
First you have to answer a question: What is the encoding of your
std::string
? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over0x7f
?)If you are using UTF-8 (the only sane choice among the 8-bit encodings) with
std::string
as container, you are already deceiving yourself into believing that you are still in control of things, because you are storing a multibyte character sequence in a container that is not aware of the multibyte concept. Even something as simple as.substr()
is a ticking timebomb. (Because splitting a multibyte sequence will result in an invalid (sub-) string.)And as soon as you try something like
std::toupper( 'ß' )
, in any encoding, you are in deep trouble. (Because it's simply not possible to do this "right" with the standard library, which can only deliver one result character, not the"SS"
needed here.) [1] Another example would bestd::tolower( 'I' )
, which should yield different results depending on the locale. In Germany,'i'
would be correct; in Turkey,'ı'
(LATIN SMALL LETTER DOTLESS I) is the expected result.Then there is the point that the standard library is depending on which locales are supported on the machine your software is running on... and what do you do if it isn't?
So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not
std::string
.(C++11 note:
std::u16string
andstd::u32string
are better, but still not perfect.)While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.
And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows, so you'd have to supply them together with your application, and that opens a whole new can of worms...)
So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:
Compile (with G++ in this example):
This gives:
[1] In 2017, the Council for German Orthography ruled that "ẞ" U+1E9E LATIN CAPITAL LETTER SHARP S could be used officially, as an option beside the traditional "SS" conversion to avoid ambiguity e.g. in passports (where names are capitalized). My beautiful go-to example, made obsolete by committee decision...