Is uninitialized local variable the fastest random

2019-01-01 04:34发布

I know the uninitialized local variable is undefined behaviour(UB), and also the value may have trap representations which may affect further operation, but sometimes I want to use the random number only for visual representation and will not further use them in other part of program, for example, set something with random color in a visual effect, for example:

void updateEffect(){
    for(int i=0;i<1000;i++){
        int r;
        int g;
        int b;
        star[i].setColor(r%255,g%255,b%255);
        bool isVisible;
        star[i].setVisible(isVisible);
    }
}

is it that faster than

void updateEffect(){
    for(int i=0;i<1000;i++){
        star[i].setColor(rand()%255,rand()%255,rand()%255);
        star[i].setVisible(rand()%2==0?true:false);
    }
}

and also faster than other random number generator?

22条回答
呛了眼睛熬了心
2楼-- · 2019-01-01 05:10

Let me say this clearly: we do not invoke undefined behavior in our programs. It is never ever a good idea, period. There are rare exceptions to this rule; for example, if you are a library implementer implementing offsetof. If your case falls under such an exception you likely know this already. In this case we know using uninitialized automatic variables is undefined behavior.

Compilers have become very aggressive with optimizations around undefined behavior and we can find many cases where undefined behavior has lead to security flaws. The most infamous case is probably the Linux kernel null pointer check removal which I mention in my answer to C++ compilation bug? where a compiler optimization around undefined behavior turned a finite loop into an infinite one.

We can read CERT's Dangerous Optimizations and the Loss of Causality (video) which says, amongst other things:

Increasingly, compiler writers are taking advantage of undefined behaviors in the C and C++ programming languages to improve optimizations.

Frequently, these optimizations are interfering with the ability of developers to perform cause-effect analysis on their source code, that is, analyzing the dependence of downstream results on prior results.

Consequently, these optimizations are eliminating causality in software and are increasing the probability of software faults, defects, and vulnerabilities.

Specifically with respect to indeterminate values, the C standard defect report 451: Instability of uninitialized automatic variables makes for some interesting reading. It has not been resolved yet but introduces the concept of wobbly values which means the indeterminatness of a value may propagate through the program and can have different indeterminate values at different points in the program.

I don't know of any examples where this happens but at this point we can't rule it out.

Real examples, not the result you expect

You are unlikely to get random values. A compiler could optimize the away the loop altogether. For example, with this simplified case:

void updateEffect(int  arr[20]){
    for(int i=0;i<20;i++){
        int r ;    
        arr[i] = r ;
    }
}

clang optimizes it away (see it live):

updateEffect(int*):                     # @updateEffect(int*)
    retq

or perhaps get all zeros, as with this modified case:

void updateEffect(int  arr[20]){
    for(int i=0;i<20;i++){
        int r ;    
        arr[i] = r%255 ;
    }
}

see it live:

updateEffect(int*):                     # @updateEffect(int*)
    xorps   %xmm0, %xmm0
    movups  %xmm0, 64(%rdi)
    movups  %xmm0, 48(%rdi)
    movups  %xmm0, 32(%rdi)
    movups  %xmm0, 16(%rdi)
    movups  %xmm0, (%rdi)
    retq

Both of these cases are perfectly acceptable forms of undefined behavior.

Note, if we are on an Itanium we could end up with a trap value:

[...]if the register happens to hold a special not-a-thing value, reading the register traps except for a few instructions[...]

Other important notes

It is interesting to note the variance between gcc and clang noted in the UB Canaries project over how willing they are to take advantage of undefined behavior with respect to uninitialized memory. The article notes (emphasis mine):

Of course we need to be completely clear with ourselves that any such expectation has nothing to do with the language standard and everything to do with what a particular compiler happens to do, either because the providers of that compiler are unwilling to exploit that UB or just because they have not gotten around to exploiting it yet. When no real guarantee from the compiler provider exists, we like to say that as-yet unexploited UBs are time bombs: they’re waiting to go off next month or next year when the compiler gets a bit more aggressive.

As Matthieu M. points out What Every C Programmer Should Know About Undefined Behavior #2/3 is also relevant to this question. It says amongst other things (emphasis mine):

The important and scary thing to realize is that just about any optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above.

To me, this is deeply dissatisfying, partially because the compiler inevitably ends up getting blamed, but also because it means that huge bodies of C code are land mines just waiting to explode.

For completeness sake I should probably mention that implementations can choose to make undefined behavior well defined, for example gcc allows type punning through unions while in C++ this seems like undefined behavior. If this is the case the implementation should document it and this will usually not be portable.

查看更多
闭嘴吧你
3楼-- · 2019-01-01 05:10

You need to have a definition of what you mean by 'random'. A sensible definition involves that the values you get should have little correlation. That's something you can measure. It's also not trivial to achieve in a controlled, reproducible manner. So undefined behaviour is certainly not what you are looking for.

查看更多
谁念西风独自凉
4楼-- · 2019-01-01 05:10

There are certain situations in which uninitialized memory may be safely read using type "unsigned char*" [e.g. a buffer returned from malloc]. Code may read such memory without having to worry about the compiler throwing causality out the window, and there are times when it may be more efficient to have code be prepared for anything memory might contain than to ensure that uninitialized data won't be read (a commonplace example of this would be using memcpy on partially-initialized buffer rather than discretely copying all of the elements that contain meaningful data).

Even in such cases, however, one should always assume that if any combination of bytes will be particularly vexatious, reading it will always yield that pattern of bytes (and if a certain pattern would be vexatious in production, but not in development, such a pattern won't appear until code is in production).

Reading uninitialized memory might be useful as part of a random-generation strategy in an embedded system where one can be sure the memory has never been written with substantially-non-random content since the last time the system was powered on, and if the manufacturing process used for the memory causes its power-on state to vary in semi-random fashion. Code should work even if all devices always yield the same data, but in cases where e.g. a group of nodes each need to select arbitrary unique IDs as quickly as possible, having a "not very random" generator which gives half the nodes the same initial ID might be better than not having any initial source of randomness at all.

查看更多
人间绝色
5楼-- · 2019-01-01 05:10

Use 7757 every place you are tempted to use uninitialized variables. I picked it randomly from a list of prime numbers:

  1. it is defined behavior

  2. it is guaranteed to not always be 0

  3. it is prime

  4. it is likely to be as statistically random as uninitualized variables

  5. it is likely to be faster than uninitialized variables since its value is known at compile time

查看更多
弹指情弦暗扣
6楼-- · 2019-01-01 05:11

Not mentioned yet, but code paths that invoke undefined behavior are allowed to do whatever the compiler wants, e.g.

void updateEffect(){}

Which is certainly faster than your correct loop, and because of UB, is perfectly conformant.

查看更多
妖精总统
7楼-- · 2019-01-01 05:12

Your particular code example would probably not do what you are expecting. While technically each iteration of the loop re-creates the local variables for the r, g, and b values, in practice it's the exact same memory space on the stack. Hence it won't get re-randomized with each iteration, and you will end up assigning the same 3 values for each of the 1000 colors, regardless of how random the r, g, and b are individually and initially.

Indeed, if it did work, I would be very curious as to what's re-randomizing it. The only thing I can think of would be an interleaved interrupt that piggypacked atop that stack, highly unlikely. Perhaps internal optimization that kept those as register variables rather than as true memory locations, where the registers get re-used further down in the loop, would do the trick, too, especially if the set visibility function is particularly register-hungry. Still, far from random.

查看更多
登录 后发表回答