C++11 case insensitive comparison of beginning of

2020-03-01 20:26发布

问题:

I have to check if the particular string begins with another one. Strings are encoded using utf8, and a comparison should be case insensitive.

I know that this is very similar to that topic Case insensitive string comparison in C++ but I do not want to use the boost library and I prefer portable solutions (If it is 'nearly' impossible, I prefer Linux oriented solutions).

Is it possible in C++11 using its regexp library? Or just using simple string compare methods?

回答1:

The only way I know of that is UTF8/internationalization/culture-aware is the excellent and well-maintained IBM ICU: International Components for Unicode. It's a C/C++ library for *nix or Windows into which a ton of research has gone to provide a culture-aware string library, including case-insensitive string comparison that's both fast and accurate.

IMHO, the two things you should never write yourself unless you're doing a thesis paper are encryption and culture-sensitive string libraries.



回答2:

Are there any restrictions on what can be in the string you're looking for? It it's user input, and can be any UTF-8 string, the problem is extremely complex. As others have mentioned, one character can have several different representations, so you'd probably have to normalize the strings first. Then: what counts as equal? Should 'E' compare equal to 'é' (as is usual in some circles in French), or not (which would be conform to the "official" rules of the Imprimerie nationale).

For all but the most trivial definitions, rolling your own will represent a significant effort. For this sort of thing, the library ICU is the reference. It contains all that you'll need. Note however that it works on UTF16, not UTF8, so you'll have to convert the strings first, as well as normalizing them. (ICU has support for both.)



回答3:

Using the stl regex classes you could do something like the following snippet. Unfortunately its not utf8. Changing str2 to std::wstring str2 = L"hello World" results in a lot of conversion warnings. Making str1 an std::wchar doesn't work at all, since std::regex doesn't allow a whar input (as far as i can see).

#include <regex>
#include <iostream>
#include <string>

int main()
{
    //The input strings
    std::string str1 = "Hello";
    std::string str2 = "hello World";

    //Define the regular expression using case-insensitivity
    std::regex regx(str1, std::regex_constants::icase);

    //Only search at the beginning 
    std::regex_constants::match_flag_type fl = std::regex_constants::match_continuous;

    //display some output
    std::cout << std::boolalpha << std::regex_search(str2.begin(), str2.end(), regx, fl) << std::endl;

    return 0;
}