可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
There are two ways to zero out an integer/float array:
memset(array, 0, sizeof(int)*arraysize);
or:
for (int i=0; i <arraysize; ++i)
array[i]=0;
obviously, memset is faster for large arraysize
. However, at what point is the overhead of memset actually larger than the overhead of the for loop? For example, for an array of size 5 - which would be best? The first, the 2nd, or maybe even the un-rolled version:
array[0] = 0;
array[1] = 0;
array[2] = 0;
array[3] = 0;
array[4] = 0;
回答1:
In all likelihood, memset() will be inlined by your compiler (most compilers treat it as an 'intrinsic', which basically means it's inlined, except maybe at the lowest optimizations or unless explicitly disabled).
For example, here are some release notes from GCC 4.3:
Code generation of block move
(memcpy
) and block set (memset
)
was rewritten. GCC can now pick the
best algorithm (loop, unrolled loop,
instruction with rep prefix or a
library call) based on the size of the
block being copied and the CPU being
optimized for. A new option
-minline-stringops-dynamically
has
been added. With this option string
operations of unknown size are
expanded such that small blocks are
copied by in-line code, while for
large blocks a library call is used.
This results in faster code than
-minline-all-stringops
when the
library implementation is capable of
using cache hierarchy hints. The
heuristic choosing the particular
algorithm can be overwritten via
-mstringop-strategy
. Newly also
memset
of values different from 0 is
inlined.
It might be possible for the compiler to do something similar with the alternative examples you gave, but I'd bet it's less likely to.
And it's grep
-able and more immediately obvious at a glance what the intent is to boot (not that the loop is particularly difficult to grok either).
回答2:
As Michael already noted, gcc and I guess most other compilers optimize this already very well. For example gcc turns this
char arr[5];
memset(arr, 0, sizeof arr);
into
movl $0x0, <arr+0x0>
movb $0x0, <arr+0x4>
It doesn't get any better than that...
回答3:
There's no way of answering the question without measuring. It will depend entirely on the compiler, cpu and runtime library implementations.
memset() can be bit of a "code smell", because it can be prone to buffer overflows, parameter reversals and has the unfortunate ability of only clearing 'byte-wise'. However it's a safe bet that it will be 'fastest' in all but extreme cases.
I tend to use a macro to wrap this to avoid some of the issues:
#define CLEAR(s) memset(&(s), 0, sizeof(s))
This sidesteps the size calculations and removes the problem of swapping the length and vlaue parameters.
In short, use memset() "under the hood". Write what you intend, and let the compiler worry about optimizations. Most are incredibly good at it.
回答4:
Considering this code per se evrything is already been told. But if you consider it in its program, of which I don't know nothing, something else can be done. For example, if this code is to be executed every some time to clear an array, you could run a thread that constantly allocates a new array of zero elements assigned to a global variable, which your code, when needs the array to be cleared, simply points to.
This is a third option. Of course if you plan to run your code on a processor with at least two cores and this makes sense. Also the code must be run more than once to see the benefits. For only a one-time run, you could declare an array filled with zeros and then point to it when needed.
Hope this may help someone