How to convert std::string to lower case?

2018-12-31 04:35发布

I want to convert a std::string to lowercase. I am aware of the function tolower(), however in the past I have had issues with this function and it is hardly ideal anyway as use with a std::string would require iterating over each character.

Is there an alternative which works 100% of the time?

21条回答
低头抚发
2楼-- · 2018-12-31 05:05

An alternative to Boost is POCO (pocoproject.org).

POCO provides two variants:

  1. The first variant makes a copy without altering the original string.
  2. The second variant changes the original string in place.
    "In Place" versions always have "InPlace" in the name.

Both versions are demonstrated below:

#include "Poco/String.h"
using namespace Poco;

std::string hello("Stack Overflow!");

// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));

// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
查看更多
无色无味的生活
3楼-- · 2018-12-31 05:06

Another approach using range based for loop with reference variable

string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

cout<<test<<endl;
查看更多
人间绝色
4楼-- · 2018-12-31 05:08

This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}
查看更多
临风纵饮
5楼-- · 2018-12-31 05:09

Simplest way to convert string into loweercase without bothering about std namespace is as follows

1:string with/without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

2:string without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}
查看更多
冷夜・残月
6楼-- · 2018-12-31 05:09

Copy because it was disallowed to improve answer. Thanks SO


string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

Explanation:

for(auto& c : test) is a range-based for loop of the kind
for (range_declaration:range_expression)loop_statement:

  1. range_declaration: auto& c
    Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.

  2. range_expression: test
    The range in this case are the characters of string test.

The characters of the string test are available as a reference inside the for loop through identifier c.

查看更多
零度萤火
7楼-- · 2018-12-31 05:10

tl;dr

Use the ICU library.


First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)

If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself into believing that you are still in control of things, because you are storing a multibyte character sequence in a container that is not aware of the multibyte concept. Even something as simple as .substr() is a ticking timebomb. (Because splitting a multibyte sequence will result in an invalid (sub-) string.)

And as soon as you try something like std::toupper( 'ß' ), in any encoding, you are in deep trouble. (Because it's simply not possible to do this "right" with the standard library, which can only deliver one result character, not the "SS" needed here.) [1] Another example would be std::tolower( 'I' ), which should yield different results depending on the locale. In Germany, 'i' would be correct; in Turkey, 'ı' (LATIN SMALL LETTER DOTLESS I) is the expected result.

Then there is the point that the standard library is depending on which locales are supported on the machine your software is running on... and what do you do if it isn't?

So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not std::string.

(C++11 note: std::u16string and std::u32string are better, but still not perfect.)

While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.

And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows, so you'd have to supply them together with your application, and that opens a whole new can of worms...)

So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    char const * someString = "Eidenges\xe4\xdf";
    icu::UnicodeString someUString( someString, "ISO-8859-1" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale.
    std::cout << someUString.toLower( "de_DE" ) << "\n";
    std::cout << someUString.toUpper( "de_DE" ) << "\n";
    return 0;
}

Compile (with G++ in this example):

g++ -Wall example.cpp -licuuc -licuio

This gives:

eidengesäß
EIDENGESÄSS

[1] In 2017, the Council for German Orthography ruled that "ẞ" U+1E9E LATIN CAPITAL LETTER SHARP S could be used officially, as an option beside the traditional "SS" conversion to avoid ambiguity e.g. in passports (where names are capitalized). My beautiful go-to example, made obsolete by committee decision...

查看更多
登录 后发表回答