Is defining a probability distribution costly?

2019-08-06 03:34发布

问题:

I'm coding a physics simulation and I'm now feeling the need for optimizing it. I'm thinking about improving one point: one of the methods of one of my class (which I call a billion times in several cases) defines everytime a probability distribution. Here is the code:

void myClass::myMethod(){ //called billions of times in several cases
uniform_real_distribution<> probd(0,1);
uniform_int_distribution<> probh(1,h-2);
uniform_int_distribution<> probv(1,v-2);
    //rest of the code
}

Could I pass the distribution as member of the class so that I won't have to define them everytime? And just initialize them in the constructor and redefine them when h and v change? Can it be a good optimizing progress? And last question, could it be something that is already corrected by the compiler (g++ in my case) when compiled with flag -O3 or -O2?

Thank you in advance!

Update: I coded it and timed both: the program turned actually a bit slower (a few percents) so I'm back at what I started with: creating the probability distributions at each loop

回答1:

Answer A: I shouldn't think so, for a uniform distribution it's just going to copy the parameter values into place, maybe with a small amount of arithmetic, and that will be well optimized.

However, I believe distribution objects can have state. They can use part of the random data from a call to the generator, and are permitted save the rest of the randomness to use next time the distribution is used, in order to reduce the total number of calls to the generator. So when you destroy a distribution object you might be discarding some possibly-costly random data.

Answer B: stop guessing and test it.

Time your code, then add static to the definition of probd and time it again.



回答2:

  1. Yes
  2. Yes
  3. Well, there may be some advantage, but AFAIK those objects aren't really heavyweight/expensive to construct. Also, with locals you may gain something in data locality and in assumptions the optimizer can make.
  4. I don't think they are automatically moved as class variables (especially if your class is POD - in that case I doubt the compiler will dare to modify its layout); most probably, instead, they are completely optimized away - only the code of the called methods - in particular operator() - may remain, referring directly to h and v. But this must be checked by looking at the generated assembly.

Incidentally, if you have a performance problem, besides optimizing obvious points (non-optimal algorithms used in inner loops, continuous memory allocations, removing useless copies of big objects, ...) you should really try to use a profiler to find the real "hot spots" in your code, and concentrate to optimize them instead of going randomly through all the code.



回答3:

uniform_real_distribution maintains a state of type param_type which is two double values (using default template parameters). The constructor assigns to these and is otherwise trivial, the destructor is trivial.

Therefore, constructing a temporary within your function has an overhead of storing 2 double values as compared to initializing 1 pointer (or reference) or going through an indirection via this. In theory, it might therefore be faster (though, what appears to be faster, or what would make sense to run faster isn't necessary any faster). Since it's not much work, it's certainly worth trying and timing whether there's a difference, even if it is a micro-optimization.

Some 3-4 extra cycles are normally neglegible, but since you're saying "billions of times" it may of course very well make a measurable difference. 3 cycles times one billion is 1 second on a 3GHz machine.

Of course, optimization without profiling is always somewhat... awkward. You might very well find that a different part in your code that's called billions of times saves a lot more cycles.

EDIT:
Since you're not going to modify it, and since the first distribution is initialized with literal values, you might actually make it a constant (such as a constexpr or namespace level static const). That should, regardless of the other two, allow the compiler to generate the most efficient code in any case for that one.