Which tolower in C++?

Given string foo, I've written answers on how to use cctype's tolower to convert the characters to lowercase

transform(cbegin(foo), cend(foo), begin(foo), static_cast<int (*)(int)>(tolower))

But I've begun to consider locale's tolower, which could be used like this:

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), foo.size()));

Is there a reason to prefer one of these over the other?
Does their functionality differ at all?
I mean other than the fact that tolower accepts and returns an int which I assume is just some antiquated C stuff?

标签： c++ string locale ctype tolower

3条回答

▲ chillily

2楼-- · 2019-01-19 10:51

In the first case (cctype) the locale is set implicitely:

Converts the given character to lowercase according to the character conversion rules defined by the currently installed C locale.

http://en.cppreference.com/w/cpp/string/byte/tolower

In the second (locale's) case you have to explicitely set the locale:

Converts parameter c to its lowercase equivalent if c is an uppercase letter and has a lowercase equivalent, as determined by the ctype facet of locale loc. If no such conversion is possible, the value returned is c unchanged.

http://www.cplusplus.com/reference/locale/tolower/

0人赞添加讨论(0) 举报

做个烂人

3楼-- · 2019-01-19 10:56

Unfortunately,both are equally bad. Although std::string pretends to be a utf-8 encoded string, non of the methods/function (including tolower), are really utf-8 aware. So, tolower / tolower + locale may work with characters which are single byte (= ASCII), they will fail for every other set of languages.

On Linux, I'd use ICU library. On Windows, I'd use CharUpper function.

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

4楼-- · 2019-01-19 10:58

It should be noted that the language designers were aware of cctype's tolower when locale's tolower was created. It improved in 2 primary ways:

As is mentioned in progressive_overload's answer the locale version allowed the use of the facet ctype, even a user modified one, without requiring the shuffling in of a new LC_CTYPE in via setlocale and the restoration of the previous LC_CTYPE
From section 7.1.6.2[dcl.type.simple]3:

It is implementation-defined whether objects of char type are represented as signed or unsigned quantities. The signed specifier forces char objects to be signed

Which creates an the potential for undefined behavior with the cctype version of tolower's if it's argument:

Is not representable as unsigned char and does not equal EOF

So there is an additional input and output static_cast required by the cctype version of tolower yielding:

transform(cbegin(foo), cend(foo), begin(foo), [](const unsigned char i){ return tolower(i); });

Since the locale version operates directly on chars there is no need for a type conversion.

So if you don't need to perform the conversion in a different facet ctype it simply becomes a style question of whether you prefer the transform with a lambda required by the cctype version, or whether you prefer the locale version's:

use_facet<ctype<char>>(cout.getloc()).tolower(data(foo), next(data(foo), size(foo)));

0人赞添加讨论(0) 举报

Which tolower in C++?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间