可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have the following returned from an API Call as part of a larger dataset:

{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052600'}

{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1, tzinfo=tzutc()), 'Price': '0.052500'}

Ideally I would use the timestamp as an index on the pandas data frame however this appears to fail as there is a duplicate when converting to JSON:

df = df.set_index(pd.to_datetime(df['Timestamp']))
print(new_df.to_json(orient='index'))

ValueError: DataFrame index must be unique for orient='index'.

Any guidance on the best way to deal with this situation? Throw away one datapoint? The time does not get more fine-grain than to the second, and there is obviously a price change during that second.

回答1:

I think you can change duplicates datetimes by adding ms by cumcount and to_timedelta:

d = [{'Time': datetime.datetime(2017, 5, 21, 18, 18, 1), 'Price': '0.052600'},
     {'Time': datetime.datetime(2017, 5, 21, 18, 18, 1), 'Price': '0.052500'}]
df = pd.DataFrame(d)
print (df)
      Price                Time
0  0.052600 2017-05-21 18:18:01
1  0.052500 2017-05-21 18:18:01

print (pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms'))
0          00:00:00
1   00:00:00.001000
dtype: timedelta64[ns]

df['Time'] = df['Time'] + pd.to_timedelta(df.groupby('Time').cumcount(), unit='ms')
print (df)
      Price                    Time
0  0.052600 2017-05-21 18:18:01.000
1  0.052500 2017-05-21 18:18:01.001

new_df = df.set_index('Time')
print(new_df.to_json(orient='index'))
{"1495390681000":{"Price":"0.052600"},"1495390681001":{"Price":"0.052500"}}

回答2:

You could use .duplicated to keep first or last entry. Have a look at pandas.DataFrame.duplicated

How should I Handle duplicate times in time series

问题:

回答1:

回答2:

收藏的人(0)

How should I Handle duplicate times in time series

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮