C++ fine granular time

The following piece of code gives 0 as runtime of the function. Can anybody point out the error?

struct timeval start,end;
long seconds,useconds;
gettimeofday(&start, NULL);
int optimalpfs=optimal(n,ref,count);
gettimeofday(&end, NULL);
seconds  = end.tv_sec  - start.tv_sec;
useconds = end.tv_usec - start.tv_usec;
long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
cout<<"\nOptimal Runtime is "<<opt_runtime<<"\n";

I get both start and end time as the same.I get the following output

Optimal Runtime is 0

Tell me the error please.

回答1:

POSIX 1003.1b-1993 specifies interfaces for clock_gettime() (and clock_getres()), and offers that with the MON option there can be a type of clock with a clockid_t value of CLOCK_MONOTONIC (so that your timer isn't affected by system time adjustments). If available on your system then these functions return a structure which has potential resolution down to one nanosecond, though the latter function will tell you exactly what resolution the clock has.

 struct timespec {
         time_t  tv_sec;         /* seconds */
         long    tv_nsec;        /* and nanoseconds */
 };

You may still need to run your test function in a loop many times for the clock to register any time elapsed beyond its resolution, and perhaps you'll want to run your loop enough times to last at least an order of magnitude more time than the clock's resolution.

Note though that apparently the Linux folks mis-read the POSIX.1b specifications and/or didn't understand the definition of a monotonically increasing time clock, and their CLOCK_MONOTONIC clock is affected by system time adjustments, so you have to use their invented non-standard CLOCK_MONOTONIC_RAW clock to get a real monotonic time clock.

Alternately one could use the related POSIX.1 timer_settime() call to set a timer running, a signal handler to catch the signal delivered by the timer, and timer_getoverrun() to find out how much time elapsed between the queuing of the signal and its final delivery, and then set your loop to run until the timer goes off, counting the number of iterations in the time interval that was set, plus the overrun.

Of course on a preemptive multi-tasking system these clocks and timers will run even while your process is not running, so they are not really very useful for benchmarking.

Slightly more rare is the optional POSIX.1-1999 clockid_t value of CLOCK_PROCESS_CPUTIME_ID, indicated by the presence of the _POSIX_CPUTIME from <time.h>, which represents the CPU-time clock of the calling process, giving values representing the amount of execution time of the invoking process. (Even more rare is the TCT option of clockid_t of CLOCK_THREAD_CPUTIME_ID, indicated by the _POSIX_THREAD_CPUTIME macro, which represents the CPU time clock, giving values representing the amount of execution time of the invoking thread.)

Unfortunately POSIX makes no mention of whether these so-called CPUTIME clocks count just user time, or both user and system (and interrupt) time, accumulated by the process or thread, so if your code under profiling makes any system calls then the amount of time spent in kernel mode may, or may not, be represented.

Even worse, on multi-processor systems, the values of the CPUTIME clocks may be completely bogus if your process happens to migrate from one CPU to another during its execution. The timers implementing these CPUTIME clocks may also run at different speeds on different CPU cores, and at different times, further complicating what they mean. I.e. they may not mean anything related to real wall-clock time, but only be an indication of the number of CPU cycles (which may still be useful for benchmarking so long as relative times are always used and the user is aware that execution time may vary depending on external factors). Even worse it has been reported that on Linux CPU TimeStampCounter-based CPUTIME clocks may even report the time that a process has slept.

If your system has a good working getrusage() system call then it will hopefully be able to give you a struct timeval for each of the the actual user and system times separately consumed by your process while it was running. However since this puts you back to a microsecond clock at best then you'll need to run your test code enough times repeatedly to get a more accurate timing, calling getrusage() once before the loop, and again afterwards, and the calculating the differences between the times given. For simple algorithms this might mean running them millions of times, or more. Note also that on many systems the division between user time and system time is done somewhat arbitrarily and if examined separately in a repeated loop one or the other can even appear to run backwards. However if your algorithm makes no system calls then summing the time deltas should still be a fair total time for your code execution.

BTW, take care when comparing time values such that you don't overflow or end up with a negative value in a field, either as @Nim suggests, or perhaps like this (from NetBSD's <sys/time.h>):

    #define timersub(tvp, uvp, vvp)                             \
        do {                                                    \
            (vvp)->tv_sec = (tvp)->tv_sec - (uvp)->tv_sec;      \
            (vvp)->tv_usec = (tvp)->tv_usec - (uvp)->tv_usec;   \
            if ((vvp)->tv_usec < 0) {                           \
                (vvp)->tv_sec--;                                \
                (vvp)->tv_usec += 1000000;                      \
            }                                                   \
        } while (0)

(you might even want to be more paranoid that tv_usec is in range)

One more important note about benchmarking: make sure your function is actually being called, ideally by examining the assembly output from your compiler. Compiling your function in a separate source module from the driver loop usually convinces the optimizer to keep the call. Another trick is to have it return a value that you assign inside the loop to a variable defined as volatile.

回答2:

You've got weird mix of floats and ints here:

long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;

Try using:

long opt_runtime = (long)(seconds * 1000 + (float)useconds/1000);

This way you'll get your results in milliseconds.

回答3:

The execution time of optimal(...) is less than the granularity of gettimeofday(...). This likely happes on Windows. On Windows the typical granularity is up to 20 ms. I've answered a related gettimeofday(...) question here. For Linux I asked How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? and got a good result.

More information on how to obtain accurate timing is described in this SO answer.

回答4:

I normally do such a calculation as:

long long ss = start.tv_sec * 1000000LL + start.tv_usec;
long long es = end.tv_sec * 1000000LL + end.tv_usec;

Then do a difference

long long microsec_diff = es - ss;

Now convert as required:

double seconds = microsec_diff / 1000000.;

Normally, I don't bother with the last step, do all timings in microseconds.