How do I measure the execution time of a method in ECLiPSe CLP? Currently, I have this:
measure_traditional(Difficulty,Selection,Choice):-
statistics(runtime, _),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
time(solve_traditional(Difficulty,Selection,Choice,_)),
statistics(runtime,[_|T]), % T
write(T).
I need to write the time it took to perform a method solve_traditional(...) and write it out to a text file. However, it is not precise enough. Sometimes time will print 0.015 or 0.016 seconds for the given method, but usually it prints 0.0 seconds.
Figuring the method completes too fast, I decided to make use of statistics(runtime, ...) to measure the time it takes between two runtime calls. I could then measure for example the time it takes to complete 20 method calls and divide the measured time T by 20.
Only problem is, with 20 calls T equals either 0, 16, 32 or 48 milliseconds. Apparently, it measures the time for each method call separately and finds the sum of the execution times (which is often just 0.0s). This beats the whole purpose of measuring the runtime for N method calls and dividing the time T by N.
In short: the current methods I'm using for execution time measurements are inadequate. Is there a way to make it more precise (9 decimals for example)?
Benchmarking is a tricky business in any programming language, and particularly so in CLP. Especially if you plan to publish your results, you should be extremely thorough and make absolutely sure you are measuring what you claim to measure.
Timers: Are you measuring real time, process cpu time, thread cpu time? Including time spent in system calls? Including or excluding garbage collection? ...
See the different timers offered by the statistics/2 primitive.
There is a real-time high-resolution timer that can be accessed via statistics(hr_time,T).
Timer resolution: In your example the timer resolution seems to be 1/60 sec. That means, to get 3 significant digits in your time measurement, you have to measure at least a runtime of 1000*1/60 = 16.7 seconds.
If your benchmark runtime is too short, you have to run it multiple times.
Runtime variance: On modern machines it is increasingly difficult to get reproducible timings. This is due to effects that have nothing to do with the program you are measuring, such as cache behaviour, paging, context switches, power management hardware, memory alignment, etc.
Run enough repetitions, run on a quiet machine, make sure your results are reproducible.
Repeating benchmarks: In a system like ECLiPSe, running benchmarks repeatedly must be done carefully to ensure that the successive runs really do the same computation, and ideally have same or similar cache and garbage collection behaviour.
In your code, you run the benchmark successively in a conjunction. This is not recommended because variable instantiations, delayed goals or garbage can survive from previous runs and slow down or speed up subsequent runs. As suggested above, you could use the pattern
run_n_times(N,Goal) :- \+ ( between(1,N,1,_), \+ Goal ).
which is essentially a way of repeating N times the sequence
once(Goal), fail
The point of this is that the combination of once/1
and fail
undoes all of Goal
's computation, so that the next iteration starts as much as possible from a similar machine state. Unfortunately, this undo-process itself adds extra runtime, which distorts the measurement...
Test overheads: If you run your benchmark several times, you need a test framework that does that for you, and this contributes to the runtime you measure.
You either have to make sure that the overhead is negligible, or you have to measure the overhead (e.g. by running the test framework with a dummy benchmark) and subtract it, for example:
benchmark(N, DummyGoal, Goal, Time) :-
cputime(T1),
run_n_times(N, DummyGoal),
cputime(T2),
run_n_times(N, Goal),
cputime(T3),
Time is (T3-T2)-(T2-T1).
CLP specifics: There are many other considerations specific to the kind of data-driven operations that occur in CLP solvers, and which make CLP runtimes very difficult to compare. These solvers have many internal degrees of freedom regarding scheduling of propagators, degrees of pruning, tie breaking rules in search control, etc.
A paper that discusses these things specifically is:
On Benchmarking Constraint Logic Programming Platforms, by Mark Wallace, Joachim Schimpf, Kish Shen and Warwick Harvey. In CONSTRAINTS Journal, ed. E.C. Freuder,9(1), pp 5-34, Kluwer, 2004.