Find missing minute data in time series data using

2020-07-23 08:57发布

问题:

I have a time series data that has data for each minute, but due to some problems in sensor, sometimes the data is not fetched and that minute data is not recorded. I want to find on which day and on which hour this happened. I have the data in pandas dataframe. This is my code snippet to store in data frame, I would like to display which minute data wasn't received.

l=['Year', 'Month', 'Day', 'Hour', 'Minute']
df = pd.DataFrame(columns=l)
k=0

if __name__ == '__main__':
    client = MongoClient("localhost", 27017, maxPoolSize=50)
    db=client.test
    collection=db['data']
    cursor = collection.find({"deviceId":3},{"timestamp":1,"cd":1}).sort("timestamp",-1).limit(1000)
    for document in cursor:
        for key,value in document.items()[1:-1]:
            df.loc[k,'Year']=2017
            df.loc[k,'Month']=value.month
            df.loc[k,'Day']=value.day
            df.loc[k,'Hour']=value.hour
            df.loc[k,'Minute']=value.minute
            k=k+1
    minute_a = pd.Series(np.arange(0, 60))

I was thinking to use isin function using minute_a but as every hour I have to check, I can't figure out how to do it?

回答1:

isin would be good way of doing this. However, the easiest way of doing this would be to flatten your sensor time data into a single DatetimeIndex so you can compare it with a reference DatetimeIndex.

# creating reference DatetimeIndex idx_ref with a minute frequency
end=datetime.now().replace(second=0, microsecond=0)
dt = end - timedelta(days=1)
idx_ref = pd.DatetimeIndex(start=dt, end=end,freq='min')

# idx_dat represents your DatetimeIndex from the sensor
gaps = idx_ref[~idx_ref.isin(idx_dat)]

Assuming you are only interested in the time gaps of course.