How to combine year, month, and day columns to sin

2019-01-27 02:24发布

问题:

I have the following dataframe df:

        id  lat        lon      year    month   day         
0       381 53.30660   -0.54649 2004    1       2       
1       381 53.30660   -0.54649 2004    1       3            
2       381 53.30660   -0.54649 2004    1       4   

and I want to create a new column df['Date'] where the year, month, and day columns are combined according to the format yyyy-m-d.

Following this post, I did:

`df['Date']=pd.to_datetime(df['year']*10000000000
                           +df['month']*100000000
                           +df['day']*1000000,
                           format='%Y-%m-%d%')`

The result is not what I expected, as it starts from 1970 instead of 2004, and it also contains the hour stamp, which I did not specify:

        id  lat        lon      year    month   day  Date           
0       381 53.30660   -0.54649 2004    1       2    1970-01-01 05:34:00.102    
1       381 53.30660   -0.54649 2004    1       3    1970-01-01 05:34:00.103         
2       381 53.30660   -0.54649 2004    1       4    1970-01-01 05:34:00.104

As the dates should be in the 2004-1-2 format, what am I doing wrong?

回答1:

There is an easier way:

In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])

In [251]: df
Out[251]:
    id      lat      lon  year  month  day       Date
0  381  53.3066 -0.54649  2004      1    2 2004-01-02
1  381  53.3066 -0.54649  2004      1    3 2004-01-03
2  381  53.3066 -0.54649  2004      1    4 2004-01-04

from docs:

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [year, month, day, minute, second, ms, us, ns]) or plurals of the same



回答2:

To fix your code

df['Date']=pd.to_datetime(df.year*10000+df.month*100+df.day,format='%Y%m%d')
df
Out[57]: 
    id      lat      lon  year  month  day       Date
0  381  53.3066 -0.54649  2004      1    2 2004-01-02
1  381  53.3066 -0.54649  2004      1    3 2004-01-03
2  381  53.3066 -0.54649  2004      1    4 2004-01-04


回答3:

One solution would be to convert these columns to string, concatenate using agg + str.join, and then convert to datetime.

df['Date'] = pd.to_datetime(
    df[['year', 'month', 'day']].astype(str).agg('-'.join, axis=1))

df

    id      lat      lon  year  month  day       Date
0  381  53.3066 -0.54649  2004      1    2 2004-01-02
1  381  53.3066 -0.54649  2004      1    3 2004-01-03
2  381  53.3066 -0.54649  2004      1    4 2004-01-04

You may also want to add an errors='coerce' argument if you have invalid datetime combinations between your columns.