I have some dataframe as below, what I want to do is to combine the rows with same "yyyymmdd" and "hr " into one row. (there are several rows with same "yyyymmdd" and "hr" )
yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck
10 2015-12-27 9 0 0 0 0 0 0 0 23 0
181 2015-12-27 10 0 0 0 0 0 0 0 2 0
65 2015-12-27 11 0 0 0 0 0 0 0 20 0
4 2015-12-27 12 0 0 0 0 0 0 0 4 0
0 2015-12-27 17 0 0 0 0 0 0 0 2 0
141 2015-12-27 19 1 0 0 0 0 0 0 0 0
160 2015-12-28 8 0 8 0 0 0 0 0 0 0
82 2015-12-28 9 0 0 0 0 0 0 19 0 0
113 2015-12-28 9 11 0 0 0 0 0 0 0 0
180 2015-12-28 9 0 11 0 0 0 0 0 0 0
9 2015-12-28 10 0 13 0 0 0 0 0 0 0
76 2015-12-28 10 85 0 0 0 0 0 0 0 0
107 2015-12-28 10 0 0 0 0 0 0 15 0 0
188 2015-12-28 10 0 0 0 0 2 0 0 0 0
34 2015-12-28 11 0 0 0 0 0 0 14 0 0
69 2015-12-28 11 0 0 0 0 2 0 0 0 0
134 2015-12-28 11 0 11 0 0 0 0 0 0 0
158 2015-12-28 11 2 0 0 0 0 0 0 0 0
part of the output I want should like this for instance:
yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck
2015-12-28 10 85 13 0 0 2 0 15 0 0
please share some ideas that I can use in python pandas or SQL, thanks!
=========================================================================
Now I have 2 more question want to ask :
how can I "fill" the "hr" index of the dataframe ? it suppose should be something like this :
yyyymmdd hr ariel cat kiki mmax vicky gaolie shiu nick ck 0 2015-12-27 8 NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 2015-12-27 9 0 0 0 0 0 0 0 23 0 2 2015-12-27 10 0 0 0 0 0 0 0 2 0 3 2015-12-27 11 0 0 0 0 0 0 0 20 0 4 2015-12-27 12 0 0 0 0 0 0 0 4 0 5 2015-12-27 13 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6 2015-12-27 14 NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 2015-12-27 15 NaN NaN NaN NaN NaN NaN NaN NaN NaN 8 2015-12-27 16 NaN NaN NaN NaN NaN NaN NaN NaN NaN 9 2015-12-27 17 0 0 0 0 0 0 0 2 0 10 2015-12-27 18 NaN NaN NaN NaN NaN NaN NaN NaN NaN 11 2015-12-27 19 1 0 0 0 0 0 0 0 0 12 2015-12-27 20 NaN NaN NaN NaN NaN NaN NaN NaN NaN 13 2015-12-28 8 0 8 0 0 0 0 0 0 0 14 2015-12-28 9 11 11 0 0 0 0 19 0 0 15 2015-12-28 10 85 13 0 0 2 0 15 0 0 16 2015-12-28 11 2 11 0 0 2 0 14 0 0 17 2015-12-28 12 2 20 0 4 0 0 10 0 0 18 2015-12-28 13 8 9 0 9 3 0 9 0 0 19 2015-12-28 14 4 10 0 8 0 0 22 0 0 20 2015-12-28 15 3 3 0 2 0 0 16 0 0 21 2015-12-28 16 14 5 1 1 0 0 19 0 0 22 2015-12-28 17 15 1 2 0 0 0 19 0 0 23 2015-12-28 18 0 0 0 6 0 0 0 0 0 24 2015-12-28 19 0 0 0 5 0 0 0 0 0 25 2015-12-28 20 0 0 0 1 0 0 0 0 0
how can I plot the line charts based on columns and hr ? (x-axis = columns , i.e. : ariel ,cat, kiki...) (y-axis = hr, i.e. : 8,9,10...20 ) every chart represents one data (i.e. 2015-12-27, 2015-12-28..)
Thanks!!
Put your data into a Pandas dataframe, and then groupby and get the max of each group, Copy-Pasting your example into a csv, it looks like this:
Output:
Use reset_index() in case you don't want the multi-index.