Should C++ programmer avoid memset?

2019-01-13 14:44发布

问题:

I heard a saying that c++ programmers should avoid memset,

class ArrInit {
    //! int a[1024] = { 0 };
    int a[1024];
public:
    ArrInit() {  memset(a, 0, 1024 * sizeof(int)); }
};

so considering the code above,if you do not use memset,how could you make a[1..1024] filled with zero?Whats wrong with memset in C++?

thanks.

回答1:

The issue is not so much using memset() on the built-in types, it is using them on class (aka non-POD) types. Doing so will almost always do the wrong thing and frequently do the fatal thing - it may, for example, trample over a virtual function table pointer.



回答2:

In C++ std::fill or std::fill_n may be a better choice, because it is generic and therefore can operate on objects as well as PODs. However, memset operates on a raw sequence of bytes, and should therefore never be used to initialize non-PODs. Regardless, optimized implementations of std::fill may internally use specialization to call memset if the type is a POD.



回答3:

Zero-initializing should look like this:

class ArrInit {
    int a[1024];
public:
    ArrInit(): a() { }
};

As to using memset, there are a couple of ways to make the usage more robust (as with all such functions): avoid hard-coding the array's size and type:

memset(a, 0, sizeof(a));

For extra compile-time checks it is also possible to make sure that a indeed is an array (so sizeof(a) would make sense):

template <class T, size_t N>
size_t array_bytes(const T (&)[N])  //accepts only real arrays
{
    return sizeof(T) * N;
}

ArrInit() { memset(a, 0, array_bytes(a)); }

But for non-character types, I'd imagine the only value you'd use it to fill with is 0, and zero-initialization should already be available in one way or another.



回答4:

What's wrong with memset in C++ is mostly the same thing that's wrong with memset in C. memset fills memory region with physical zero-bit pattern, while in reality in virtually 100% of cases you need to fill an array with logical zero-values of corresponding type. In C language, memset is only guaranteed to properly initialize memory for integer types (and its validity for all integer types, as opposed to just char types, is a relatively recent guarantee added to C language specification). It is not guaranteed to properly set to zero any floating point values, it is not guaranteed to produce proper null-pointers.

Of course, the above might be seen as excessively pedantic, since the additional standards and conventions active on the given platform might (and most certainly will) extend the applicability of memset, but I would still suggest following the Occam's razor principle here: don't rely on any other standards and conventions unless you really really have to. C++ language (as well a C) offers several language-level features that let you safely initialize your aggregate objects with proper zero values of proper type. Other answers already mentioned these features.



回答5:

It is "bad" because you are not implementing your intent.

Your intent is to set each value in the array to zero and what you have programmed is setting an area of raw memory to zero. Yes, the two things have the same effect but it's clearer to simply write code to zero each element.

Also, it's likely no more efficient.

class ArrInit
{
public:
    ArrInit();
private:
    int a[1024];
};

ArrInit::ArrInit()
{
    for(int i = 0; i < 1024; ++i) {
        a[i] = 0;
    }
}


int main()
{
    ArrInit a;
}

Compiling this with visual c++ 2008 32 bit with optimisations turned on compiles the loop to -

; Line 12
    xor eax, eax
    mov ecx, 1024               ; 00000400H
    mov edi, edx
    rep stosd

Which is pretty much exactly what the memset would likely compile to anyway. But if you use memset there is no scope for the compiler to perform further optimisations, whereas by writing your intent it's possible that the compiler could perform further optimisations, for example noticing that each element is later set to something else before it is used so the initialisation can be optimised out, which it likely couldn't do nearly as easily if you had used memset.



回答6:

This is an OLD thread, but here's an interesting twist:

class myclass
{
  virtual void somefunc();
};

myclass onemyclass;

memset(&onemyclass,0,sizeof(myclass));

works PERFECTLY well!

However,

myclass *myptr;

myptr=&onemyclass;

memset(myptr,0,sizeof(myclass));

indeed sets the virtuals (i.e somefunc() above) to NULL.

Given that memset is drastically faster than setting to 0 each and every member in a large class, I've been doing the first memset above for ages and never had a problem.

So the really interesting question is how come it works? I suppose that the compiler actually starts to set the zero's BEYOND the virtual table... any idea?



回答7:

Your code is fine. I thought the only time in C++ where memset is dangerous is when you do something along the lines of:
YourClass instance; memset(&instance, 0, sizeof(YourClass);.

I believe it might zero out internal data in your instance that the compiler created.



回答8:

In addition to badness when applied to classes, memset is also error prone. It's very easy to get the arguments out-of-order, or to forget the sizeof portion. The code will usually compile with these errors, and quietly do the wrong thing. The symptom of the bug might not manifest until much later, making it difficult to track down.

memset is also problematic with lots of plain types, like pointers and floating point. Some programmers set all bytes to 0, assuming the pointers will then be NULL and floats will be 0.0. That's not a portable assumption.



回答9:

There's no real reason to not use it except for the few cases people pointed out that no one would use anyway, but there's no real benefit to using it either unless you are filling memguards or something.



回答10:

The short answer would be to use an std::vector with an initial size of 1024.

std::vector< int > a( 1024 ); // Uses the types default constructor, "T()".

The initial value of all elements of "a" would be 0, as the std::vector(size) constructor (as well as vector::resize) copies the value of the default constructor for all elements. For built-in types (a.k.a. intrinsic types, or PODs), you are guaranteed the initial value to be 0:

int x = int(); // x == 0

This would allow the type that "a" uses to change with minimal fuss, even to that of a class.

Most functions that take a void pointer (void*) as a parameter, such as memset, are not type safe. Ignoring an object's type, in this way, removes all C++ style semantics objects tend to rely on, such as construction, destruction and copying. memset makes assumptions about a class, which violates abstraction (not knowing or caring what is inside a class). While this violation isn't always immediately obvious, especially with intrinsic types, it can potentially lead to hard to locate bugs, especially as the code base grows and changes hands. If the type that is memset is a class with a vtable (virtual functions) it will also overwrite that data.



回答11:

In C++ you should use new. In the case with simple arrays like in your example there is no real problem with using it. However, if you had an array of classes and used memset to initialize it, you woudn't be constructing the classes properly.

Consider this:

class A {
    int i;

    A() : i(5) {}
}

int main() {
    A a[10];
    memset (a, 0, 10 * sizeof (A));
}

The constructor for each of those elements will not be called, so the member variable i will not be set to 5. If you used new instead:

 A a = new A[10];

than each element in the array will have its constructor called and i will be set to 5.