numpy datetime64 add or substract date interval

2020-08-09 08:49发布

问题:

I am parsing a huge ascii file with dates assigned to entries. So, I found myself using datetime package in parallel to numpy.datetime64 to add array capabilities. I know that the pandas package is probably most recommended to be used for date, however try to pull this off without pandas. I have been looking around for a neat way to add/subtract a certain datestep like one year, or 3 month from a datetime64 object.

Currently, I am converting dt64 object to dt object and use replace function to change the year for example and have to convert it back to dt64 afterward which is a bit messy to me. So, I would appreciate if anyone has a better solution using only numpy.datetime64 format.

Example: Converting a "YYYY-12-31" to "(YYYY-1)-12-31"

a = np.datetime64(2014,12,31)               # a is dt64 object
b = a.astype(object)                        # b is dt object converted from a
c = np.datetime64( b.replace(b.year-1))     # c is dt64 object shifted back 1 year (a -1year)

回答1:

You can use the numpy.timedelta64 object to perform time delta calculations on a numpy.datetime64 object, see Datetime and Timedelta Arithmetic.

Since a year can be either 365 or 366 days, it is not possible to substract a year, but you could substract 365 days instead:

import numpy as np
np.datetime64('2014-12-31') - np.timedelta64(365,'D')

results in:

numpy.datetime64('2013-12-31')



回答2:

How about:

import numpy as np
import pandas as pd

def numpy_date_add(vd_array,y_array):    
    ar=((vd_array.astype('M8[Y]') + np.timedelta64(1, 'Y') * \
    y_array).astype('M8[M]')+ \
    (vd_array.astype('M8[M]')- \
    vd_array.astype('M8[Y]'))).astype('M8[D]')+ \
    (vd_array.astype('M8[D]')-\
    vd_array.astype('M8[M]')) 
    return ar

# usage
valDate=pd.datetime(2016,12,31)
    per=[[0,3,'0-3Yr'],
        [3,7,'3-7Yrs'],
        [7,10,'7-10Yrs'],
        [10,15,'10-15Yrs'],
        [15,20,'15-20Yrs'],
        [20,30,'20-30Yrs'],
        [30,40,'30-40Yrs'],
        [40,200,'> 40Yrs']]
    pert=pd.DataFrame(per,columns=['start_period','end_period','mat_band'])
    pert['valDate']=valDate
    pert['startdate'] = numpy_date_add(pert.valDate.values,pert.start_period.values)
    pert['enddate'] = numpy_date_add(pert.valDate.values,pert.end_period.values)

    print(pert)

Is vector based pandas usage and I think it deals with leap years.