可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Background: I use rand()
, std::rand()
, std::random_shuffle()
and other functions in my code for scientific calculations. To be able to reproduce my results, I always explicitly specify the random seed, and set it via srand()
. That worked fine until recently, when I figured out that libxml2 would also call srand()
lazily on its first usage - which was after my early srand()
call.
I filled in a bug report to libxml2 about its srand()
call, but I got the answer:
Initialize libxml2 first then.
That's a perfectly legal call to be made from a library. You should
not expect that nobody else calls srand()
, and the man page nowhere
states that using srand()
multiple time should be avoided
This is actually my question now. If the general policy is that every lib can/should/will call srand()
, and I can/might also call it here and there, I don't really see how that can be useful at all. Or how is rand()
useful then?
That is why I thought, the general (unwritten) policy is that no lib should ever call srand()
and the application should call it only once in the beginning. (Not taking multi-threading into account. I guess in that case, you anyway should use something different.)
I also tried to research a bit which other libraries actually call srand()
, but I didn't find any. Are there any?
My current workaround is this ugly code:
{
// On the first call to xmlDictCreate,
// libxml2 will initialize some internal randomize system,
// which calls srand(time(NULL)).
// So, do that first call here now, so that we can use our
// own random seed.
xmlDictPtr p = xmlDictCreate();
xmlDictFree(p);
}
srand(my_own_seed);
Probably the only clean solution would be to not use that at all and only to use my own random generator (maybe via C++11 <random>
). But that is not really the question. The question is, who should call srand()
, and if everyone does it, how is rand()
useful then?
回答1:
Use the new <random>
header instead. It allows for multiple engine instances, using different algorithms and more importantly for you, independent seeds.
[edit]
To answer the "useful" part, rand
generates random numbers. That's what it's good for. If you need fine-grained control, including reproducibility, you should not only have a known seed but a known algorithm. srand
at best gives you a fixed seed, so that's not a complete solution anyway.
回答2:
Well, the obvious thing has been stated a few times by others, use the new C++11 generators. I'm restating it for a different reason, though.
You use the output for scientific calculations, and rand
usually implements a rather poor generator (in the mean time, many mainstream implementations use MT19937 which apart from bad state recovery isn't so bad, but you have no guarantee for a particular algorithm, and at least one mainstream compiler still uses a really poor LCG).
Don't do scientific calculations with a poor generator. It doesn't really matter if you have things like hyperplanes in your random numbers if you do some silly game shooting little birds on your mobile phone, but it matters big time for scientific simulations. Don't ever use a bad generator. Don't.
Important note: std::random_shuffle
(the version with two parameters) may actually call rand
, which is a pitfall to be aware of if you're using that one, even if you otherwise use the new C++11 generators found in <random>
.
About the actual issue, calling srand
twice (or even more often) is no problem. You can in principle call it as often as you want, all it does is change the seed, and consequentially the pseudorandom sequence that follows. I'm wondering why an XML library would want to call it at all, but they're right in their response, it is not illegitimate for them to do it. But it also doesn't matter.
The only important thing to make sure is that either you don't care about getting any particular pseudorandom sequence (that is, any sequence will do, you're not interested in reproducing an exact sequence), or you are the last one to call srand
, which will override any prior calls.
That said, implementing your own generator with good statistical properties and a sufficiently long period in 3-5 lines of code isn't all that hard either, with a little care. The main advantage (apart from speed) is that you control exactly where your state is and who modifies it.
It is unlikely that you will ever need periods much longer than 2128 because of the sheer forbidding time to actually consume that many numbers. A 3GHz computer consuming one number every cycle will run for 1021 years on a 2128 period, so there's not much of an issue for humans with average lifespans. Even assuming that the supercomputer you run your simulation on is a trillion times faster, your grand-grand-grand children won't live to see the end of the period.
Insofar, periods like 219937 which current "state of the art" generators deliver are really ridiculous, that's trying to improve the generator at the wrong end if you ask me (it's better to make sure they're statistically firm and that they recover quickly from a worst-case state, etc.). But of course, opinions may differ here.
This site lists a couple of fast generators with implementations. They're xorshift generators combined with an addition or multiplication step and a small (from 2 to 64 machine words) lag, which results in both fast and high quality generators (there's a test suite as well, and the site's author wrote a couple of papers on the subject, too). I'm using a modification of one of these (the 2-word 128-bit version ported to 64-bits, with shift triples modified accordingly) myself.
回答3:
This problem is being tackled in C++11's random number generation, i.e. you can create an instance of a class:
std::default_random_engine e1
which allows you to fully control only random numbers generated from object e1
(as opposed to whatever would be used in libxml). The general rule of thumb would then be to use new construct, as you can generate your random numbers independently.
Very good documentation
To address your concerns - I also think that it would be a bad practice to call srand()
in a library like libxml. However, it's more that srand()
and rand()
are not designed to be used in the context you are trying to use them - they are enough when you just need some random numbers, as libxml does. However, when you need reproducibility and be sure that you are independent on others, the new <random>
header is the way to go for you. So, to sum up, I don't think it's a good practice on library's side, but it's hard to blame them for doing that. Also, I could not imagine them changing that, as billion other pieces of software probably depend on it.
回答4:
The real answer here is that if you want to be sure that YOUR random number sequence isn't being altered by someone else's code, you need a random number context that is private to YOUR work. Note that calling srand
is only one small part of this. For example, if you call some function in some other library that calls rand
, it will also disrupt the sequence of YOUR random numbers.
In other words, if you want predictable behaviour from your code, based on random number generation, it needs to be completely separate from any other code that uses random numbers.
Others have suggested using the C++ 11 random number generation, which is one solution.
On Linux and other compatible libraries, you could also use rand_r
, which takes a pointer to an unsigned int
to a seed that is used for that sequence. So if you initialize that a seed
variable, then use that with all calls to rand_r
, it will be producing a unique sequence for YOUR code. This is of course still the same old rand
generator, just a separate seed. The main reason I meantion this is that you could fairly easily do something like this:
int myrand()
{
static unsigned int myseed = ... some initialization of your choice ...;
return rand_r(&myseed);
}
and simply call myrand
instead of std::rand
(and should be doable to work into the std::random_shuffle
that takes a random generator parameter)