How to round a Pandas `DatetimeIndex`?

2019-02-16 09:48发布

I have a pandas.DatetimeIndex, e.g.:

pd.date_range('2012-1-1 02:03:04.000',periods=3,freq='1ms')
>>> [2012-01-01 02:03:04, ..., 2012-01-01 02:03:04.002000]

I would like to round the dates (Timestamps) to the nearest second. How do I do that? The expected result is similar to:

[2012-01-01 02:03:04.000000, ..., 2012-01-01 02:03:04.000000]

Is it possible to accomplish this by rounding a Numpy datetime64[ns] to seconds without changing the dtype [ns]?

np.array(['2012-01-02 00:00:00.001'],dtype='datetime64[ns]')

4条回答
男人必须洒脱
2楼-- · 2019-02-16 10:07

round() method was added for DatetimeIndex, Timestamp, TimedeltaIndex and Timedelta in pandas 0.18.0. Now we can do the following:

In[114]: index = pd.DatetimeIndex([pd.Timestamp('2012-01-01 02:03:04.000'), pd.Timestamp('2012-01-01 02:03:04.002'), pd.Timestamp('20130712 02:03:04.500'), pd.Timestamp('2012-01-01 02:03:04.501')])

In[115]: index.values
Out[115]: 
array(['2012-01-01T02:03:04.000000000', '2012-01-01T02:03:04.002000000',
       '2013-07-12T02:03:04.500000000', '2012-01-01T02:03:04.501000000'], dtype='datetime64[ns]')

In[116]: index.round('S')
Out[116]: 
DatetimeIndex(['2012-01-01 02:03:04', '2012-01-01 02:03:04',
               '2013-07-12 02:03:04', '2012-01-01 02:03:05'],
              dtype='datetime64[ns]', freq=None)

round() accepts frequency parameter. String aliases for it are listed here.

查看更多
我命由我不由天
3楼-- · 2019-02-16 10:07

There is little point in changing the index itself - since you can just generate using date_range with the desired frequency parameter as in your question.

I assume what you are trying to do is change the frequency of a Time Series that contains data, in which case you can use resample (documentation). For example if you have the following time series:

dt_index = pd.date_range('2012-1-1 00:00.001',periods=3, freq='1ms')
ts = pd.Series(randn(3), index=dt_index)


2012-01-01 00:00:00           0.594618
2012-01-01 00:00:00.001000    0.874552
2012-01-01 00:00:00.002000   -0.700076
Freq: L

Then you can change the frequency to seconds using resample, specifying how you want to aggregate the values (mean, sum etc.):

ts.resample('S', how='sum')

2012-01-01 00:00:00    0.594618
2012-01-01 00:00:01    0.174475
Freq: S
查看更多
做自己的国王
4楼-- · 2019-02-16 10:21

For more general rounding, you can make use of the fact that Pandas Timestamp objects mostly use the standard library datetime.datetime API, including the datetime.datetime.replace() method.

So, to solve your microsecond rounding problem, you could do:

import datetime
import pandas as pd

times = pd.date_range('2012-1-1 02:03:04.499',periods=3,freq='1ms')
# Add 5e5 microseconds and truncate to simulate rounding
times_rounded = [(x + datetime.timedelta(microseconds=5e5)).replace(microsecond=0) for x in times]

from IPython.display import display
print('Before:')
display(list(times))
print('After:')
display(list(times_rounded))

Output:

Before:
[Timestamp('2012-01-01 02:03:04.499000', offset='L'),
 Timestamp('2012-01-01 02:03:04.500000', offset='L'),
 Timestamp('2012-01-01 02:03:04.501000', offset='L')]
After:
[Timestamp('2012-01-01 02:03:04', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L'),
 Timestamp('2012-01-01 02:03:05', offset='L')]

You can use the same technique to, e.g., round to the nearest day (as long as you're not concerned about leap seconds and the like):

times = pd.date_range('2012-1-1 08:00:00', periods=3, freq='4H')
times_rounded = [(x + datetime.timedelta(hours=12)).replace(hour=0, second=0, microsecond=0) for x in times]

Inspired by this SO post: https://stackoverflow.com/a/19718411/1410871

查看更多
Emotional °昔
5楼-- · 2019-02-16 10:24

Update: if you're doing this to a DatetimeIndex / datetime64 column a better way is to use np.round directly rather than via an apply/map:

np.round(dtindex_or_datetime_col.astype(np.int64), -9).astype('datetime64[ns]')

Old answer (with some more explanation):

Whilst @Matti's answer is clearly the correct way to deal with your situation, I thought I would add an answer how you might round a Timestamp to the nearest second:

from pandas.lib import Timestamp

t1 = Timestamp('2012-1-1 00:00:00')
t2 = Timestamp('2012-1-1 00:00:00.000333')

In [4]: t1
Out[4]: <Timestamp: 2012-01-01 00:00:00>

In [5]: t2
Out[5]: <Timestamp: 2012-01-01 00:00:00.000333>

In [6]: t2.microsecond
Out[6]: 333

In [7]: t1.value
Out[7]: 1325376000000000000L

In [8]: t2.value
Out[8]: 1325376000000333000L

# Alternatively: t2.value - t2.value % 1000000000
In [9]: long(round(t2.value, -9)) # round milli-, micro- and nano-seconds
Out[9]: 1325376000000000000L

In [10]: Timestamp(long(round(t2.value, -9)))
Out[10]: <Timestamp: 2012-01-01 00:00:00>

Hence you can apply this to the entire index:

def to_the_second(ts):
    return Timestamp(long(round(ts.value, -9)))

dtindex.map(to_the_second)
查看更多
登录 后发表回答