Why does rand() + rand() produce negative numbers?

2019-01-29 19:54发布

站内文章 / C

60 0

来，给爷笑一个

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I observed that rand() library function when it is called just once within a loop, it almost always produces positive numbers.

for (i = 0; i < 100; i++) {
    printf("%d\n", rand());
}

But when I add two rand() calls, the numbers generated now have more negative numbers.

for (i = 0; i < 100; i++) {
    printf("%d = %d\n", rand(), (rand() + rand()));
}

Can someone explain why I am seeing negative numbers in the second case?

PS: I initialize the seed before the loop as srand(time(NULL)).

回答1:

rand() is defined to return an integer between 0 and RAND_MAX.

rand() + rand()

could overflow. What you observe is likely a result of undefined behaviour caused by integer overflow.

回答2:

The problem is the addition. rand() returns an int value of 0...RAND_MAX. So, if you add two of them, you will get up to RAND_MAX * 2. If that exceeds INT_MAX, the result of the addition overflows the valid range an int can hold. Overflow of signed values is undefined behaviour and may lead to your keyboard talking to you in foreign tongues.

As there is no gain here in adding two random results, the simple idea is to just not do it. Alternatively you can cast each result to unsigned int before the addition if that can hold the sum. Or use a larger type. Note that long is not necessarily wider than int, the same applies to long long if int is at least 64 bits!

Conclusion: Just avoid the addition. It does not provide more "randomness". If you need more bits, you might concatenate the values sum = a + b * (RAND_MAX + 1), but that also likely requires a larger data type than int.

As your stated reason is to avoid a zero-result: That cannot be avoided by adding the results of two rand() calls, as both can be zero. Instead, you can just increment. If RAND_MAX == INT_MAX, this cannot be done in int. However, (unsigned int)rand() + 1 will do very, very likely. Likely (not definitively), because it does require UINT_MAX > INT_MAX, which is true on all implementations I'm aware of (which covers quite some embedded architectures, DSPs and all desktop, mobile and server platforms of the past 30 years).

Warning:

Although already sprinkled in comments here, please note that adding two random values does not get a uniform distribution, but a triangular distribution like rolling two dice: to get 12 (two dice) both dice have to show 6. for 11 there are already two possible variants: 6 + 5 or 5 + 6, etc.

So, the addition is also bad from this aspect.

Also note that the results rand() generates are not independent of each other, as they are generated by a pseudorandom number generator. Note also that the standard does not specify the quality or uniform distribution of the calculated values.

回答3:

This is an answer to a clarification of the question made in comment to this answer,

the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind.

The problem was to avoid 0. There are (at least) two problems with the proposed solution. One is, as the other answers indicate, that rand()+rand() can invoke undefined behavior. Best advice is to never invoke undefined behavior. Another issue is there's no guarantee that rand() won't produce 0 twice in a row.

The following rejects zero, avoids undefined behavior, and in the vast majority of cases will be faster than two calls to rand():

int rnum;
for (rnum = rand(); rnum == 0; rnum = rand()) {}
// or do rnum = rand(); while (rnum == 0);

回答4:

Basically rand() produce numbers between 0 and RAND_MAX, and 2 RAND_MAX > INT_MAX in your case.

You can modulus with the max value of your data-type to prevent overflow. This ofcourse will disrupt the distribution of the random numbers, but rand is just a way to get quick random numbers.

#include <stdio.h>
#include <limits.h>

int main(void)
{
    int i=0;

    for (i=0; i<100; i++)
        printf(" %d : %d \n", rand(), ((rand() % (INT_MAX/2))+(rand() % (INT_MAX/2))));

    for (i=0; i<100; i++)
        printf(" %d : %ld \n", rand(), ((rand() % (LONG_MAX/2))+(rand() % (LONG_MAX/2))));

    return 0;
}

回答5:

May be you could try rather a tricky approach by ensuring that the value returned by sum of 2 rand() never exceeds the value of RAND_MAX. A possible approach could be sum = rand()/2 + rand()/2; This would ensure that for a 16 bit compiler with RAND_MAX value of 32767 even if both rand happens to return 32767, even then (32767/2 = 16383) 16383+16383 = 32766, thus would not result in negative sum.

回答6:

the reason i was adding was to avoid '0' as the random number in my code. rand()+rand() was the quick dirty solution which readily came to my mind.

A simple solution (okay, call it a "Hack") which never produces a zero result and will never overflow is:

x=(rand()/2)+1    // using divide  -or-
x=(rand()>>1)+1   // using shift which may be faster
                  // compiler optimization may use shift in both cases

This will limit your maximum value, but if you don't care about that, then this should work fine for you.

回答7:

To avoid 0, try this:

int rnumb = rand()%(INT_MAX-1)+1;

You need to include limits.h.

回答8:

While what everyone else has said about the likely overflow could very well be the cause of the negative, even when you use unsigned integers. The real problem is actually using time/date functionality as the seed. If you have truly become familiar with this functionality you will know exactly why I say this. As what it really does is give a distance (elapsed time) since a given date/time. While the use of the date/time functionality as the seed to a rand(), is a very common practice, it really is not the best option. You should search better alternatives, as there are many theories on the topic and I could not possibly go into all of them. You add into this equation the possibility of overflow and this approach was doomed from the beginning.

Those that posted the rand()+1 are using the solution that most use in order to guarantee that they do not get a negative number. But, that approach is really not the best way either.

The best thing you can do is take the extra time to write and use proper exception handling, and only add to the rand() number if and/or when you end up with a zero result. And, to deal with negative numbers properly. The rand() functionality is not perfect, and therefore needs to be used in conjunction with exception handling to ensure that you end up with the desired result.

Taking the extra time and effort to investigate, study, and properly implement the rand() functionality is well worth the time and effort. Just my two cents. Good luck in your endeavors...