Advantages of using user-defined literal for strin

2019-02-07 16:08发布

问题:

The strings topic in the SO Documentation used to say, in the Remarks section:

Since C++14, instead of using "foo", it is recommended to use "foo"s, as s is a string literal, which converts the const char * "foo" to std::string "foo".

The only advantage I see using

std::string str = "foo"s;

instead of

std::string str = "foo";

is that in the first case the compiler can perform copy-elision (I think), which would be faster than the constructor call in the second case.

Nonetheless, this is (not yet) guaranteed, so the first one might also call a constructor, the copy constructor.

Ignoring cases where it is required to use std::string literals like

std::string str = "Hello "s + "World!"s;

is there any benefit of using std::string literals instead of const char[] literals?

回答1:

If you're part of the "Almost Always Auto" crowd, then the UDL is very important. It lets you do this:

auto str = "Foo"s;

And thus, str will be a genuine std::string, not a const char*. It therefore permits you to decide when to do which.

This is also important for auto return type deduction:

[]() {return "Foo"s;}

Or any form of type deduction, really:

template<typename T>
void foo(T &&t) {...}

foo("Foo"s);

The only advantage I see using [...] instead of [...] is that in the first case the compiler can perform copy-elision (I think), which would be faster than the constructor call in the second case.

Copy-elision is not faster than the constructor call. Either way, you're calling one of the object's constructors. The question is which one:

std::string str = "foo";

This will provoke a call to the constructor of std::string which takes a const char*. But since std::string has to copy the string into its own storage, it must get the length of the string to do so. And since it doesn't know the length, this constructor is forced to use strlen to get it (technically, char_traits<char>::length, but that's probably not going to be much faster).

By contrast:

std::string str = "foo"s;

This will use the UDL template that has this prototype:

string operator "" s(const char* str, size_t len);

See, the compiler knows the length of a string literal. So the UDL code is passed a pointer to the string and a size. And thus, it can call the std::string constructor that takes a const char* and a size_t. So there's no need for computing the string's length.

The advice in question is not for you to go around and convert every use of a literal into the s version. If you're fine with the limitations of an array of chars, use it. The advice is that, if you're going to store that literal in a std::string, it's best to get that done while it's still a literal and not a nebulous const char*.



回答2:

The advice to use "blah"s has nothing to do with efficiency and all to do with correctness for novice code.

C++ novices who don't have a background in C, tend to assume that "blah" results in an object of some reasonable string type. For example, so that one can write things like "blah" + 42, which works in many script languages. With "blah" + 42 in C++, however, one just incurs Undefined Behavior, addressing beyond the end of the character array.

But if that string literal is written as "blah"s then one instead gets a compilation error, which is much preferable.



回答3:

In addition, UDL makes it easier to have \0 in the string

std::string s = "foo\0bar"s; // s contains a \0 in its middle.
std::string s2 = "foo\0bar"; // equivalent to "foo"s


回答4:

  1. Using a C++ string literal means we do not need to call strlen to compute the length. The compiler already knows it.
  2. Might allow library implemetations where the string data points to memory in global space will using C literals must always force a copy of the data to heap memory on construction.