I am confused how pandas blew out of bounds for datetime objects with these lines:
import pandas as pd
BOMoffset = pd.tseries.offsets.MonthBegin()
# here some code sets the all_treatments dataframe and the newrowix, micolix, mocolix counters
all_treatments.iloc[newrowix,micolix] = BOMoffset.rollforward(all_treatments.iloc[i,micolix] + pd.tseries.offsets.DateOffset(months = x))
all_treatments.iloc[newrowix,mocolix] = BOMoffset.rollforward(all_treatments.iloc[newrowix,micolix]+ pd.tseries.offsets.DateOffset(months = 1))
Here all_treatments.iloc[i,micolix]
is a datetime set by pd.to_datetime(all_treatments['INDATUMA'], errors='coerce',format='%Y%m%d')
, and INDATUMA
is date information in the format 20070125
.
This logic seems to work on mock data (no errors, dates make sense), so at the moment I cannot reproduce while it fails in my entire data with the following error:
pandas.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-05-01 00:00:00
Setting the
errors
parameter inpd.to_datetime
to'coerce'
causes replacement of out of bounds values withNaT
. Quoting the docs:E.g.:
This does not fix the data (obviously), but still allows processing the non-NaT data points.
Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
And your value is out of this range 2262-05-01 00:00:00 and hence the outofbounds error
Straight out of: http://pandas-docs.github.io/pandas-docs-travis/timeseries.html#timestamp-limitations