When would you use an array rather than a vector/s

2019-01-09 10:25发布

问题:

I'm a beginner C++ programmer and so I've learnt using arrays rather than vectors (this seems to be the general way to do things, then move on to vectors later on).

I've noticed that a lot of answers on SO suggest using vectors over arrays, and strings over char arrays. It seems that this is the "proper" way to code in C++.

That all being said, when is it still worth using a classic array/char* (if ever)?

回答1:

When writing code that should used in other projects, in particular if you target special platforms (embedded, game consoles, etc.) where STL might not be present.

Old projects or projects with special requirements might not want to introduce dependencies on STL libraries. An interface depending on arrays, char* or whatever will be compatible with anything since it's part of the language. STL however is not guaranteed to be present in all build environments.



回答2:

Never.

If a raw array seems a better solution than a vector (for reasons other said here) then I use std::tr1::array or std::array in C++11 compilers (or boost::array). It simply does the checks I would do anyway to be sure and the size value usage makes DRY automatically implemented (for example I use the size in loops so that future changes of the array declaration will automatically work).

It's the array implementation "is" a raw array with checks and provided size constant anyway, so it's easy to get the array code in embedded code too because the code is not really "too smart" for any compiler. As far as the compiler support templates, I would copy the boost headers in my code to allow me to use this one instead of raw arrays. Because it's clearly too easy to make mistakes with raw arrays. Raw arrays are evil. They are error prone.

And it work very well with STL algorithms (if available).

Now, there is two cases where you need to use raw arrays (obligation) : when you're in C-only code (not communicating with C code, but writing in C-only part of code, like a C library). But then it's another language.

The other reason is when the compiler don't support templates at all...



回答3:

This question could actually be separated into two parts:

  1. How should I manage memory for flat array data?
  2. How should I access elements of a flat array?

I personally prefer to use std::vector for managing memory except in cases where I need to maintain compatibility with code that doesn't use STL (i.e. when interfacing with straight C code). It's much harder to make exception-safe code with raw arrays allocated via new or malloc (in part because it's really easy to forget that you need to worry about it). See any article on RAII for the reasons.

In practice, std::vector is implemented as a flat array. As such, it's always possible to pull out the raw array and use C-style access patterns. I typically start with the vector subscript operator syntax. For some compilers, when producing a debug version, vectors provide automatic boundary checking. This is slow (often a 10x slowdown for tight loops), but helpful in finding certain types of bugs.

If profiling on a particular platform indicates that the operator[] is a bottleneck, then I switch to directly accessing the raw array. Interestingly, depending on the compiler and OS, it can sometimes be faster to use an STL vector than a raw array.

Here are some results from a simple test application. It was compiled with Visual Studio 2008 in 32-bit Release mode using /O2 optimizations and run on Vista x64. Similar results are achieved with a 64-bit test application.

Binary search...
           fill vector (for reference) :  0.27 s
                   array with ptr math :  0.38 s <-- C-style pointers lose
                  array with int index :  0.23 s <-- [] on raw array wins
            array with ptrdiff_t index :  0.24 s
                 vector with int index :  0.30 s  <-- small penalty for vector abstraction
           vector with ptrdiff_t index :  0.30 s

Counting memory (de)allocation...
                memset (for reference) :  2.85 s
      fill malloc-ed raw array with [] :  2.66 s
     fill malloc-ed raw array with ptr :  2.81 s
         fill new-ed raw array with [] :  2.64 s
        fill new-ed raw array with ptr :  2.65 s
                  fill vector as array :  3.06 s  \ something's slower 
                           fill vector :  3.05 s  / with vector!

NOT counting memory (de)allocation...
                memset (for reference) :  2.57 s
      fill malloc-ed raw array with [] :  2.86 s
     fill malloc-ed raw array with ptr :  2.60 s
         fill new-ed raw array with [] :  2.63 s
        fill new-ed raw array with ptr :  2.78 s
                  fill vector as array :  2.49 s \ after discounting the  
                           fill vector :  2.54 s / (de)allocation vector is faster!

Code:

#define WINDOWS_LEAN_AND_MEAN
#include <windows.h>
#include <string>
#include <vector>
#include <stdio.h>

using namespace std;

__int64 freq; // initialized in main
int const N = 1024*1024*1024/sizeof(int)/2; // 1/2 GB of data
int const nIter = 10;

class Timer {
public:
  Timer(char *name) : name(name) {
    QueryPerformanceCounter((LARGE_INTEGER*)&start);
  }
  ~Timer() {
    __int64 stop;
    QueryPerformanceCounter((LARGE_INTEGER*)&stop);
    printf("  %36s : % 4.2f s\n", name.c_str(), (stop - start)/double(freq));
  }
private:
  string const name;
  __int64 start;
};


template <typename Container, typename Index>
int binarySearch_indexed(Container sortedArray, Index first, Index last, int key) {
  while (first <= last) {
    Index mid = (first + last) / 2; // NOT safe if (first+last) is too big!
    if (key > sortedArray[mid])      first = mid + 1;
    else if (key < sortedArray[mid])  last = mid - 1; 
    else return mid;  
  }
  return 0; // Use "(Index)-1" in real code
}

int Dummy = -1;
int const *binarySearch_ptr(int const *first, int const *last, int key) {
  while (first <= last) {
    int const *mid = (int const *)(((unsigned __int64)first + (unsigned __int64)last) / 2);  
    if (key > *mid)      first = mid + 1;
    else if (key < *mid)  last = mid - 1; 
    else return mid;  
  }
  return &Dummy; // no NULL checks: don't do this for real
}

void timeFillWithAlloc() {
  printf("Counting memory (de)allocation...\n");
  { 
    Timer tt("memset (for reference)");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
    free(data);
  }
  { 
    Timer tt("fill malloc-ed raw array with []");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    free(data);
  }
  { 
    Timer tt("fill malloc-ed raw array with ptr");
    int *data = (int*)malloc(N*sizeof(int));
    for (int it=0; it<nIter; it++) {
    int *d = data;
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
    free(data);
  }
  { 
    Timer tt("fill new-ed raw array with []");
    int *data = new int[N];
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    delete [] data;
  }
  { 
    Timer tt("fill new-ed raw array with ptr");
    int *data = new int[N];
    for (int it=0; it<nIter; it++) {
    int *d = data;
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
    delete [] data;
  }
  { 
    Timer tt("fill vector as array");
    vector<int> data(N); 
    for (int it=0; it<nIter; it++) {
      int *d = &data[0]; 
    for (size_t i=0; i<N; i++) *d++ = (int)i;
    }
  }
  { 
    Timer tt("fill vector");
    vector<int> data(N); 
    for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
  }
  printf("\n");
}

void timeFillNoAlloc() {
  printf("NOT counting memory (de)allocation...\n");

  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("memset (for reference)");
      for (int it=0; it<nIter; it++) memset(data, 0, N*sizeof(int));
    }
    free(data);
  }
  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("fill malloc-ed raw array with []");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
    free(data);
  }
  { 
    int *data = (int*)malloc(N*sizeof(int));
    {
      Timer tt("fill malloc-ed raw array with ptr");
      for (int it=0; it<nIter; it++) {
        int *d = data;
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
    free(data);
  }
  { 
    int *data = new int[N];
    {
      Timer tt("fill new-ed raw array with []");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
    delete [] data;
  }
  { 
    int *data = new int[N];
    {
      Timer tt("fill new-ed raw array with ptr");
      for (int it=0; it<nIter; it++) {
        int *d = data;
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
    delete [] data;
  }
  { 
    vector<int> data(N); 
    {
      Timer tt("fill vector as array");
      for (int it=0; it<nIter; it++) {
        int *d = &data[0]; 
        for (size_t i=0; i<N; i++) *d++ = (int)i;
      }
    }
  }
  { 
    vector<int> data(N); 
    {
      Timer tt("fill vector");
      for (int it=0; it<nIter; it++) for (size_t i=0; i<N; i++) data[i] = (int)i;
    }
  }
  printf("\n");
}

void timeBinarySearch() {
  printf("Binary search...\n");
  vector<int> data(N); 
  {
    Timer tt("fill vector (for reference)");
    for (size_t i=0; i<N; i++) data[i] = (int)i;
  }

  {
    Timer tt("array with ptr math");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += *binarySearch_ptr(&data[0], &data[0]+data.size(), i);
    }
  }
  {
    Timer tt("array with int index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<int const *, int>(
        &data[0], 0, (int)data.size(), -1)];
    }
  }
  {
    Timer tt("array with ptrdiff_t index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<int const *, ptrdiff_t>(
        &data[0], 0, (ptrdiff_t)data.size(), -1)];
    }
  }
  {
    Timer tt("vector with int index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<vector<int> const &, int>(
        data, 0, (int)data.size(), -1)];
    }
  }
  {
    Timer tt("vector with ptrdiff_t index");
    int sum = 0;
    for (int i=-1000000; i<1000000; i++) {
      sum += data[binarySearch_indexed<vector<int> const &, ptrdiff_t>(
        data, 0, (ptrdiff_t)data.size(), -1)];
    }
  }

  printf("\n");
}

int main(int argc, char **argv)
{
  QueryPerformanceFrequency((LARGE_INTEGER*)&freq);

  timeBinarySearch();
  timeFillWithAlloc();
  timeFillNoAlloc();

  return 0;
}


回答4:

Array/char* is useful when compatibility or performance is of very high priority. Vectors and Strings are higher-level objects that are better when code maintainability, readability and overal easiness counts. Almost always, that is.



回答5:

I suggest to use arrays whenever you know the size at the compile time itself. Although vector can be used, we have to remember that vector comes with its overhead related to the memory allocation done on heap. If the size is not known, then ofcourse use vectors.



回答6:

The only reason I could think of would be speed. You can do better optimizations on array/pointer types than on the according objects. But I would even use the STL if I absoluteltely knew the amount of data my data structures need to hold. Changing from STL to primitive types in an optimization step is better than starting the project with harder-to-read code.



回答7:

I see two reasons:

  1. Compatibility (old code without STL).
  2. Speed. (I compared the speed of using vector/binary_search & array/handwritten binary search. For the last one ugly code was obtained (with reallocation of memory), but it was about 1.2-1.5 times faster then first one, I used MS VC++ 8)


回答8:

I work on a shared library that needs access to structured data. This data is known at compile time, so it uses file-scoped constant arrays of POD (plain old data) structures to hold the data.

This causes the compiler and linker to put most of the data in a read-only section, with two benefits:

  • It can be mapped into memory directory from disk without running any special initialization code.
  • It can be shared between processes.

The only exception is that the compiler still generates initialization code to load floating point constants, so any structure containing floating point numbers ends up in a writable section. I suspect that this has something do with floating exceptions or floating point rounding modes, but I'm not sure how to verify either hypothesis.

If I used vector and string objects for this, then the compiler would generate a lot more initialization code that would execute whenever my shared library is loaded. The constant data would be allocated on the heap, so it would not be shareable between processes.

If I read the data in from a file on disk, I would have to deal with checking the format of the data, instead of having the C++ compiler do it for me. I also would have to manage the lifetime of the data in a codebase that has had this global data "baked into" it since the beginning.