clearing a small integer array: memset vs. for loo

2019-01-21 08:48发布

There are two ways to zero out an integer/float array:

memset(array, 0, sizeof(int)*arraysize);

or:

for (int i=0; i <arraysize; ++i)
    array[i]=0;

obviously, memset is faster for large arraysize. However, at what point is the overhead of memset actually larger than the overhead of the for loop? For example, for an array of size 5 - which would be best? The first, the 2nd, or maybe even the un-rolled version:

array[0] = 0;
array[1] = 0;
array[2] = 0;
array[3] = 0;
array[4] = 0;

4条回答
够拽才男人
2楼-- · 2019-01-21 09:19

There's no way of answering the question without measuring. It will depend entirely on the compiler, cpu and runtime library implementations.

memset() can be bit of a "code smell", because it can be prone to buffer overflows, parameter reversals and has the unfortunate ability of only clearing 'byte-wise'. However it's a safe bet that it will be 'fastest' in all but extreme cases.

I tend to use a macro to wrap this to avoid some of the issues:

#define CLEAR(s) memset(&(s), 0, sizeof(s))

This sidesteps the size calculations and removes the problem of swapping the length and vlaue parameters.

In short, use memset() "under the hood". Write what you intend, and let the compiler worry about optimizations. Most are incredibly good at it.

查看更多
家丑人穷心不美
3楼-- · 2019-01-21 09:20

In all likelihood, memset() will be inlined by your compiler (most compilers treat it as an 'intrinsic', which basically means it's inlined, except maybe at the lowest optimizations or unless explicitly disabled).

For example, here are some release notes from GCC 4.3:

Code generation of block move (memcpy) and block set (memset) was rewritten. GCC can now pick the best algorithm (loop, unrolled loop, instruction with rep prefix or a library call) based on the size of the block being copied and the CPU being optimized for. A new option -minline-stringops-dynamically has been added. With this option string operations of unknown size are expanded such that small blocks are copied by in-line code, while for large blocks a library call is used. This results in faster code than -minline-all-stringops when the library implementation is capable of using cache hierarchy hints. The heuristic choosing the particular algorithm can be overwritten via -mstringop-strategy. Newly also memset of values different from 0 is inlined.

It might be possible for the compiler to do something similar with the alternative examples you gave, but I'd bet it's less likely to.

And it's grep-able and more immediately obvious at a glance what the intent is to boot (not that the loop is particularly difficult to grok either).

查看更多
孤傲高冷的网名
4楼-- · 2019-01-21 09:22

As Michael already noted, gcc and I guess most other compilers optimize this already very well. For example gcc turns this

char arr[5];
memset(arr, 0, sizeof arr);

into

movl  $0x0, <arr+0x0>
movb  $0x0, <arr+0x4>

It doesn't get any better than that...

查看更多
冷血范
5楼-- · 2019-01-21 09:32

Considering this code per se evrything is already been told. But if you consider it in its program, of which I don't know nothing, something else can be done. For example, if this code is to be executed every some time to clear an array, you could run a thread that constantly allocates a new array of zero elements assigned to a global variable, which your code, when needs the array to be cleared, simply points to.

This is a third option. Of course if you plan to run your code on a processor with at least two cores and this makes sense. Also the code must be run more than once to see the benefits. For only a one-time run, you could declare an array filled with zeros and then point to it when needed.

Hope this may help someone

查看更多
登录 后发表回答