Python 3 datetime.fromtimestamp fails by 1 microse

2019-02-24 00:49发布

问题:

I want to save datetimes with microsecond resolution as timestamps. But it seems that Python 3 datetime module lost one microsecond when loading them. To test this let's create a script:

test_datetime.py:

from random import randint
from datetime import datetime

now = datetime.now()

for n in range(1000):
    d = datetime(year=now.year, month=now.month, day=now.day,
            hour=now.hour, minute=now.minute, second=now.second,
            microsecond=randint(0,999999))

    ts = d.timestamp()
    d2 = datetime.fromtimestamp(ts)

    assert d == d2, 'failed in pass {}: {} != {}'.format(n, d, d2)

python3 test_datetime.py always fails by one microsecond:

Traceback (most recent call last):
  File "test_datetime.py", line 14, in <module>
    assert d == d2, 'failed in pass {}: {} != {}'.format(n, d, d2)
AssertionError: failed in pass 4: 2014-07-02 11:51:46.984716 != 2014-07-02 11:51:46.984715

Is this behavior to be accepted? Shouldn't we rely on datetime.fromtimestamp if we want microsecond resolution?

回答1:

Timestamp values are floating point values. Floating point values are approximations, and as such, rounding errors apply.

A float value of 1404313854.442585 is not precise, for example. It is really:

>>> dt = datetime(2014, 7, 2, 16, 10, 54, 442585)
>>> dt.timestamp()
1404313854.442585
>>> format(dt.timestamp(), '.20f')
'1404313854.44258499145507812500'

That's awfully close to 442585, but not quite. It is just below 442585, so when you take just the decimal portion, multiply that by 1 million, then take just the integer portion the 0.991455078125 remainder is ignored and you end up with 442584.

As such, when you then convert the floating point value back to a datetime object, 1 microsecond rounding errors are normal.

If you require precision, don't rely on float; perhaps instead store the microsecond value as a separate integer, then use dt.fromtimestamp(seconds).replace(microsecond=microseconds).

You may find the rejection notice to PEP-410 (Use decimal.Decimal type for timestamps) enlightening in this context. The PEP touched upon the precision issue with timestamps represented as floats.



回答2:

A timestamp is a POSIX time, which is essentially conceptualized as an integer number of seconds since an arbitrary "epoch". datetime.fromtimestamp() returns "the local date and time corresponding to the POSIX timestamp, such as is returned by time.time()" whose documentation tells us it "Return[s] the time in seconds since the epoch as a floating point number. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second."

Expecting six decimal digits of precision to be retained through a conversion to and back from a timestamp seems a little unreasonable when the intermediate data type doesn't in fact guarantee sub-second accuracy. Floating point numbers are unable to represent all decimal values exactly.

EDIT: The following code tests which microsecond values are invalid for an arbitrary datetime when the program is run.

from datetime import datetime
baset = datetime.now()

dodgy = []
for i in range(1000000):
    d = baset.replace(microsecond=i)
    ts = d.timestamp()
    if d != datetime.fromtimestamp(ts):
        dodgy.append(i)
print(len(dodgy))

I got 499,968 "dodgy" times, but I haven't examined them.