Pandas: count some values in a column

2019-05-24 03:27发布

问题:

I have dataframe, it's part of them

    ID,"url","app_name","used_at","active_seconds","device_connection","device_os","device_type","device_usage"     
e990fae0f48b7daf52619b5ccbec61bc,"",Phone,2015-05-01 09:29:11,13,3g,android,smartphone,home     
e990fae0f48b7daf52619b5ccbec61bc,"",Phone,2015-05-01 09:33:00,3,unknown,android,smartphone,home     
e990fae0f48b7daf52619b5ccbec61bc,"",Phone,2015-06-01 09:33:07,1,unknown,android,smartphone,home     
e990fae0f48b7daf52619b5ccbec61bc,"",Phone,2015-06-01 09:34:30,5,unknown,android,smartphone,home     
e990fae0f48b7daf52619b5ccbec61bc,"",Messaging,2015-06-01 09:36:22,133,3g,android,smartphone,home        
e990fae0f48b7daf52619b5ccbec61bc,"",Messaging,2015-05-02 09:38:40,5,3g,android,smartphone,home      
574c4969b017ae6481db9a7c77328bc3,"",Yandex.Navigator,2015-05-01 11:04:48,70,3g,ios,smartphone,home      
574c4969b017ae6481db9a7c77328bc3,"",VK Client,2015-6-01 12:02:27,248,3g,ios,smartphone,home     
574c4969b017ae6481db9a7c77328bc3,"",Viber,2015-07-01 12:06:35,7,3g,ios,smartphone,home      
574c4969b017ae6481db9a7c77328bc3,"",VK Client,2015-08-01 12:23:26,86,3g,ios,smartphone,home     
574c4969b017ae6481db9a7c77328bc3,"",Talking Angela,2015-08-02 12:24:52,0,3g,ios,smartphone,home     
574c4969b017ae6481db9a7c77328bc3,"",My Talking Angela,2015-08-03 12:24:52,167,3g,ios,smartphone,home        
574c4969b017ae6481db9a7c77328bc3,"",Talking Angela,2015-08-04 12:27:39,34,3g,ios,smartphone,home        

I need to count quantity of days in every month to every ID.

If I try df.groupby('ID')['used_at'].count() I get quantity of visiting, how can I take and count days at month?

回答1:

I think you need groupby by ID, month and day and aggregate size:

df1 = df.used_at.groupby([df['ID'], df.used_at.dt.month,df.used_at.dt.day ]).size()

print (df1)
ID                                used_at  used_at
574c4969b017ae6481db9a7c77328bc3  5        1          1
                                  6        1          1
                                  7        1          1
                                  8        1          1
                                           2          1
                                           3          1
                                           4          1
e990fae0f48b7daf52619b5ccbec61bc  5        1          2
                                           2          1
                                  6        1          3
dtype: int64

Or by date - it is same as by year, month and day:

df1 = df.used_at.groupby([df['ID'], df.used_at.dt.date]).size()

print (df1)
ID                                used_at   
574c4969b017ae6481db9a7c77328bc3  2015-05-01    1
                                  2015-06-01    1
                                  2015-07-01    1
                                  2015-08-01    1
                                  2015-08-02    1
                                  2015-08-03    1
                                  2015-08-04    1
e990fae0f48b7daf52619b5ccbec61bc  2015-05-01    2
                                  2015-05-02    1
                                  2015-06-01    3
dtype: int64

Differences between count and size:

size counts NaN values, count does not.