I have a Pandas dataframe which is indexed by a DatetimeIndex:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values
Julian_Day 53732 non-null values
AOT_870 53732 non-null values
440-870Angstrom 53732 non-null values
440-675Angstrom 53732 non-null values
500-870Angstrom 53732 non-null values
Last_Processing_Date(dd/mm/yyyy) 53732 non-null values
Solar_Zenith_Angle 53732 non-null values
time 53732 non-null values
dtypes: datetime64[ns](2), float64(6), object(1)
I want to find the row that is closest to a certain time:
image_time = dateutil.parser.parse('2009-07-28 13:39:02')
and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.
For example:
aeronet.index - image_time
Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:
aeronet['time'] = aeronet.index
aeronet.time - image_time
This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running abs
or np.abs
on it gives an error:
abs(aeronet.time - image_time)
C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self)
1061 Yields Bytestring in Py2, Unicode String in py3.
1062 """
-> 1063 return str(self)
1064
1065 def _tidy_repr(self, max_vals=20):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self)
1021 if py3compat.PY3:
1022 return self.__unicode__()
-> 1023 return self.__bytes__()
1024
1025 def __bytes__(self):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self)
1031 """
1032 encoding = com.get_option("display.encoding")
-> 1033 return self.__unicode__().encode(encoding, 'replace')
1034
1035 def __unicode__(self):
C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
1044 else get_option("display.max_rows"))
1045 if len(self.index) > (max_rows or 1000):
-> 1046 result = self._tidy_repr(min(30, max_rows - 4))
1047 elif len(self.index) > 0:
1048 result = self._get_repr(print_header=True,
C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals)
1069 """
1070 num = max_vals // 2
-> 1071 head = self[:num]._get_repr(print_header=True, length=False,
1072 name=False)
1073 tail = self[-(max_vals - num):]._get_repr(print_header=False,
AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'
Am I approaching this the right way? If so, how should I get abs
to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?
I was confronting the same problem today. I wanted a function able to give me the closest value prior a given timestamp. Here is the function I got:
In the case that you need the global closest (and not the closest before as in my case), you can use:
You can find more information in the
get_loc
documentation.I think you can try
DatetimeIndex.asof
to find the most recent label up to and including the input. Then use the returned datetime to select the appropriate row. If you only need values for a particular column,Series.asof
exists and combines the two steps above into one.This assumes you want the closest datetime. If you don't care about the date and just want the same time every day, use
at_time
in DataFrame.Follow up:
Edit: false alarm, I had an older version locally. The latest on master should work with np.abs.
Also just to clarify:
aeronet.index - image_time doesn't work because subtraction on Index is a set difference (back in the day Index used to be constrained to be unique).
This simple method will return the (integer index of the) TimeSeriesIndex entry closest to a given datetime object. There's no need to copy the index to a regular column - simply use the
.to_pydatetime
method instead.Then you simply use the DataFrame's
.iloc
indexer:Here's a function to do this:
You can then further filter seamlessly, e.g.