Pandas; transform column with MM:SS,decimals into

2019-07-20 22:37发布

问题:

Hey: Spent several hours trying to do a quite simple thing,but couldnt figure it out.

I have a dataframe with a column, df['Time'] which contains time, starting from 0, up to 20 minutes,like this:

1:10,10
1:16,32
3:03,04

First being minutes, second is seconds, third is miliseconds (only two digits).

Is there a way to automatically transform that column into seconds with Pandas, and without making that column the time index of the series?

I already tried the following but it wont work:

pd.to_datetime(df['Time']).convert('s')   # AttributeError: 'Series' object has no attribute 'convert'

If the only way is to parse the time just point that out and I will prepare a proper / detailed answer to this question, dont waste your time =) Thank you!

回答1:

Code:

import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({'Time':['1:10,10', '1:16,32', '3:03,04']})
df['time'] = df.Time.apply(lambda x: datetime.datetime.strptime(x,'%M:%S,%f'))
df['timedelta'] = df.time - datetime.datetime.strptime('00:00,0','%M:%S,%f')
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df

Output:

      Time                       time       timedelta    secs
0  1:10,10 1900-01-01 00:01:10.100000 00:01:10.100000   70.10
1  1:16,32 1900-01-01 00:01:16.320000 00:01:16.320000   76.32
2  3:03,04 1900-01-01 00:03:03.040000 00:03:03.040000  183.04

If you have also negative time deltas:

import pandas as pd
import numpy as np
import datetime

import re
regex = re.compile(r"(?P<minus>-)?((?P<minutes>\d+):)?(?P<seconds>\d+)(,(?P<centiseconds>\d{2}))?")

def parse_time(time_str):
    parts = regex.match(time_str)
    if not parts:
        return
    parts = parts.groupdict()
    time_params = {}
    for (name, param) in parts.iteritems():
        if param and (name != 'minus'):
            time_params[name] = int(param)
    time_params['milliseconds'] = time_params['centiseconds']*10
    del time_params['centiseconds']
    return (-1 if parts['minus'] else 1) * datetime.timedelta(**time_params)

df = pd.DataFrame({'Time':['-1:10,10', '1:16,32', '3:03,04']})
df['timedelta'] = df.Time.apply(lambda x: parse_time(x))
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df

Output:

       Time        timedelta    secs
0  -1:10,10 -00:01:10.100000  -70.10
1   1:16,32  00:01:16.320000   76.32
2   3:03,04  00:03:03.040000  183.04