Vector vs string

2019-01-15 06:26发布

问题:

What is the fundamental difference, if any, between a C++ std::vector and std::basic_string?

回答1:

  • basic_string doesn't call constructors and destructors of its elements. vector does.

  • swapping basic_string invalidates iterators (enabling small string optimization), swapping vectors doesn't.

  • basic_string memory may not be allocated continuously in C++03. vector is always continuous. This difference is removed in C++0x [string.require]:

    The char-like objects in a basic_string object shall be stored contiguously

  • basic_string has interface for string operations. vector doesn't.

  • basic_string may use copy on write strategy (in pre C++11). vector can't.

Relevant quotes for non-believers:

[basic.string]:

The class template basic_string conforms to the requirements for a Sequence Container (23.2.3), for a Reversible Container (23.2), and for an Allocator-aware container (Table 99), except that basic_string does not construct or destroy its elements using allocator_traits::construct and allocator_- traits::destroy and that swap() for basic_string invalidates iterators. The iterators supported by basic_string are random access iterators (24.2.7).



回答2:

basic_string gives compiler and standard library implementations, a few freedoms over vector:

  1. The "small string optimization" is valid on strings, which allows implementations to store the actual string, rather than a pointer to the string, in the string object when the string is short. Something along the lines of:

    class string
    {
        size_t length;
        union
        {
            char * usedWhenStringIsLong;
            char usedWhenStringIsShort[sizeof(char*)];
        };
    };
    
  2. In C++03, the underlying array need not be contiguous. Implementing basic_string in terms of something like a "rope" would be possible under the current standard. (Though nobody does this because that would make the members std::basic_string::c_str() and std::basic_string::data() too expensive to implement.)
    C++11 now bans this behavior though.

  3. In C++03, basic_string allows the compiler/library vendor to use copy-on-write for the data (which can save on copies), which is not allowed for std::vector. In practice, this used to be a lot more common, but it's less common nowadays because of the impact it has upon multithreading. Either way though, your code cannot rely on whether or not std::basic_string is implemented using COW.
    C++11 again now bans this behavior.

There are a few helper methods tacked on to basic_string as well, but most are simple and of course could easily be implemented on top of vector.



回答3:

The key difference is that std::vector should keep its data in continuous memory, when std::basic_string could not to. As a result:

std::vector<char> v( 'a', 3 );
char* x = &v[0]; // valid

std::basic_string<char> s( "aaa" );
char* x2 = &s[0];     // doesn't point to continuous buffer
//For example, the behavior of 
std::cout << *(x2+1);
//is undefined.
const char* x3 = s.c_str(); // valid

On practice this difference is not so important.



回答4:

A vector is a data structure which simulates an array. Deep inside it is actually a (dynamic) Array.

The basic_string class represents a Sequence of characters. It contains all the usual operations of a Sequence, and, additionally, it contains standard string operations such as search and concatenation.

You can use vector to keep whatever data type you want std::vector<int> or <float> or even std::vector< std::vector<T> > but a basic_string can only be used for representing "text".



回答5:

The basic_string provides many string-specific comparison options. You are right in that the underlying memory management interface is very similar, but string contains many additional members, like c_str(), that would make no sense for a vector.



回答6:

One difference between std::string and std::vector is that programs may construct a string from a null-terminated string, whereas with vectors they cannot.

std::string a = "hello";          // okay
std::vector<char> b = "goodbye";  // compiler error

This often makes strings easier to work with.



回答7:

TLDR: strings are optimized to only contain character primitives, vectors can contain primitives or objects

The preeminent difference between vector and string is that vector can correctly contain objects, string works only on primitives. So vector provides these methods that would be useless for a string working with primitives:

  1. vector::emplace
  2. vector::emplace_back
  3. vector::~vector

Even extending string will not allow it to correctly handle objects, because it lacks a destructor. This should not be viewed as a drawback, it allows significant optimization over vector in that string can:

  1. Do short string optimization, potentially avoiding heap allocation, with little to no increased storage overhead
  2. Use char_traits, one of string's template arguments, to define how operations should be implemented on the contained primitives (of which only char, wchar_t, char16_t, and char32_t are implemented: http://en.cppreference.com/w/cpp/string/char_traits)

Particularly relevant are char_traits::copy, char_traits::move, and char_traits::assign obviously implying that direct assignment, rather than construction or destruction will be used which is again, preferable for primitives. All this specialization has the additional drawbacks to string that:

  1. Only char, wchar_t, char16_t, or char32_t primitives types will be used. Obviously, primitives of sizes up to 32-bit, could use their equivalently sized char_type: https://stackoverflow.com/a/35555016/2642059, but for primitives such as long long a new specialization of char_traits would need to be written, and the idea of specializing char_traits::eof and char_traits::not_eof instead of just using vector<long long> doesn't seem like the best use of time.
  2. Because of short string optimization, iterators are invalidated by all the operations that would invalidate a vector iterator, but string iterators are additionally invalidated by string::swap and string::operator=

Additional differences in the interfaces of vector and string:

  1. There is no mutable string::data: Why Doesn't std::string.data() provide a mutable char*?
  2. string provides functionality for working with words unavailable in vector: string::c_str, string::length, string::append, string::operator+=, string::compare, string::replace, string::substr, string::copy, string::find, string::rfind, string::find_first_of, string::find_first_not_of, string::flind_last_of, string::find_last_not_of, string::operator+, string::operator>>, string::operator<<, string::stoi, string::stol, string::stoll, string::stoul, string::stoull, string::stof, string::stod, string::stold, stirng::to_string, string::to_wstring
  3. Finally everywhere vector accepts arguments of another vector, string accepts a string or a char*

Note this answer is written against C++11, so strings are required to be allocated contiguously.