Converting date formats in pandas dataframe

2020-04-30 03:35发布

问题:

I have a dataframe and the Date column has two different types of date formats going on.

eg. 1983-11-10 00:00:00 and 10/11/1983

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

回答1:

I believe you need parameter dayfirst=True in to_datetime:

df = pd.DataFrame({'Date': {0: '1983-11-10 00:00:00', 1: '10/11/1983'}})
print (df)
                  Date
0  1983-11-10 00:00:00
1           10/11/1983


df['Date'] = pd.to_datetime(df.Date, dayfirst=True)
print (df)
        Date
0 1983-11-10
1 1983-11-10

because:

df['Date'] = pd.to_datetime(df.Date)
print (df)
        Date
0 1983-11-10
1 1983-10-11

Or you can specify both formats and then use combine_first:

d1 = pd.to_datetime(df.Date, format='%Y-%m-%d %H:%M:%S', errors='coerce')
d2 = pd.to_datetime(df.Date, format='%d/%m/%Y', errors='coerce')

df['Date'] = d1.combine_first(d2)
print (df)
        Date
0 1983-11-10
1 1983-11-10

General solution for multiple formats:

from functools import reduce 

def convert_formats_to_datetimes(col, formats):
    out = [pd.to_datetime(col, format=x, errors='coerce') for x in formats]
    return reduce(lambda l,r: pd.Series.combine_first(l,r), out)

formats = ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y']
df['Date'] = df['Date'].pipe(convert_formats_to_datetimes, formats)
print (df)
        Date
0 1983-11-10
1 1983-11-10


回答2:

I want them all to be the same type, how can I iterate through the Date column of my dataframe and convert the dates to one format?

Your input data is ambiguous: is 10 / 11 10th November or 11th October? You need to specify logic to determine which is appropriate. A function is useful if you with to try multiple date formats sequentially:

def date_apply_formats(s, form_lst):
    s = pd.to_datetime(s, format=form_lst[0], errors='coerce')
    for form in form_lst[1:]:
        s = s.fillna(pd.to_datetime(s, format=form, errors='coerce'))
    return s

df['Date'] = date_apply_formats(df['Date'], ['%Y-%m-%d %H:%M:%S', '%d/%m/%Y'])

Priority is given to the first item in form_lst. The solution is extendible to an arbitrary number of provided formats.



回答3:

Input date is NSECODE Date Close 1 NSE500 20000103 1291.5500 2 NSE500 20000104 1335.4500 3 NSE500 20000105 1303.8000

history_nseindex_df["Date"] = pd.to_datetime(history_nseindex_df["Date"])
history_nseindex_df["Date"] = history_nseindex_df["Date"].dt.strftime("%Y-%m-%d")

ouput is now NSECode Date Close 1 NSE500 2000-01-03 1291.5500 2 NSE500 2000-01-04 1335.4500 3 NSE500 2000-01-05 1303.8000