vector vs string for binary data

2020-01-30 08:51发布

问题:

Which is a better c++ container for holding and accessing binary data?

std::vector<unsigned char>

or

std::string

Is one more efficient than the other?
Is one a more 'correct' usage?

回答1:

You should prefer std::vector over std::string. In common cases both solutions can be almost equivalent, but std::strings are designed specifically for strings and string manipulation and that is not your intended use.



回答2:

Both are correct and equally efficient. Using one of those instead of a plain array is only to ease memory management and passing them as argument.

I use vector because the intention is more clear than with string.

Edit: C++03 standard does not guarantee std::basic_string memory contiguity. However from a practical viewpoint, there are no commercial non-contiguous implementations. C++0x is set to standardize that fact.



回答3:

Is one more efficient than the other?

This is the wrong question.

Is one a more 'correct' usage?

This is the correct question.
It depends. How is the data being used? If you are going to use the data in a string like fashon then you should opt for std::string as using a std::vector may confuse subsequent maintainers. If on the other hand most of the data manipulation looks like plain maths or vector like then a std::vector is more appropriate.



回答4:

For the longest time I agreed with most answers here. However, just today it hit me why it might be more wise to actually use std::string over std::vector<unsigned char>.

As most agree, using either one will work just fine. But often times, file data can actually be in text format (more common now with XML having become mainstream). This makes it easy to view in the debugger when it becomes pertinent (and these debuggers will often let you navigate the bytes of the string anyway). But more importantly, many existing functions that can be used on a string, could easily be used on file/binary data. I've found myself writing multiple functions to handle both strings and byte arrays, and realized how pointless it all was.



回答5:

This is a comment to dribeas answer. I write it as an answer to be able to format the code.

This is the char_traits compare function, and the behaviour is quite healthy:

static bool
lt(const char_type& __c1, const char_type& __c2)
{ return __c1 < __c2; }

template<typename _CharT>
int
char_traits<_CharT>::
compare(const char_type* __s1, const char_type* __s2, std::size_t __n)
{
  for (std::size_t __i = 0; __i < __n; ++__i)
if (lt(__s1[__i], __s2[__i]))
  return -1;
else if (lt(__s2[__i], __s1[__i]))
  return 1;
  return 0;
}


回答6:

As far as readability is concerned, I prefer std::vector. std::vector should be the default container in this case: the intent is clearer and as was already said by other answers, on most implementations, it is also more efficient.

On one occasion I did prefer std::string over std::vector though. Let's look at the signatures of their move constructors in C++11:

vector (vector&& x);

string (string&& str) noexcept;

On that occasion I really needed a noexcept move constructor. std::string provides it and std::vector does not.



回答7:

If you just want to store your binary data, you can use bitset which optimizes for space allocation. Otherwise go for vector, as it's more appropriate for your usage.



回答8:

Compare this 2 and choose yourself which is more specific for you. Both are very robust, working with STL algorithms ... Choose yourself wich is more effective for your task



回答9:

Personally I prefer std::string because string::data() is much more intuitive for me when I want my binary buffer back in C-compatible form. I know that vector elements are guaranteed to be stored contiguously exercising this in code feels a little bit unsettling.

This is a style decision that individual developer or a team should make for themselves.