I have a file containing duplicate timestamps, maximum two for each timestamp, actually they are not duplicate, it is just the second timestamp needs to add a millisecond timestamp. For example, I am having these in the file,
....
2011/1/4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01
2011/1/4 9:15:02
2011/1/4 9:15:02
2011/1/4 9:15:03
2011/1/4 9:15:03
2011/1/4 9:15:04
....
I would like to change them into
2011/1/4 9:14:00
2011/1/4 9:15:00
2011/1/4 9:15:01
2011/1/4 9:15:01.500
2011/1/4 9:15:02
2011/1/4 9:15:02.500
2011/1/4 9:15:03
2011/1/4 9:15:03.500
2011/1/4 9:15:04
....
what is the most efficient way to perform such task?
Assuming - as you have shown in your example that they are sequential:
So this algorithm should work very well... I'm just having a hell of a time with numpy's datetime datatypes.
And the idea is to add those two together. Of course, one is
datetime64
and the other isfloat64
. For whatever reasons,np.timedelta64
doesn't operate on arrays? Anyway if you can sort out the dtype issues that will work.Setup
Find the locations where the difference in time from the previous row is 0 seconds
Set theose locations to use an offset of 5 milliseconds (In your question you used 500 but could be anything). This requires numpy >= 1.7. (Not that this syntax will be changing in 0.13 to allow a more direct
df.loc[mask,'time'] += pd.offsets.Milli(5)