pandas convert some columns into rows

2019-01-01 08:51发布

问题:

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like

location    name    Jan-2010    Feb-2010    March-2010
A           \"test\"  12          20          30
B           \"foo\"   18          20          25

What I would like is for it to look like

location    name    Date        Value
A           \"test\"  Jan-2010    12       
A           \"test\"  Feb-2010    20
A           \"test\"  March-2010  30
B           \"foo\"   Jan-2010    18       
B           \"foo\"   Feb-2010    20
B           \"foo\"   March-2010  25

problem is I don\'t know how many dates are in the column (though I know they will always start after name)

回答1:

You can use pd.melt to get most of the way there, and then sort:

>>> df
  location  name  Jan-2010  Feb-2010  March-2010
0        A  test        12        20          30
1        B   foo        18        20          25
>>> df2 = pd.melt(df, id_vars=[\"location\", \"name\"], 
                  var_name=\"Date\", value_name=\"Value\")
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
1        B   foo    Jan-2010     18
2        A  test    Feb-2010     20
3        B   foo    Feb-2010     20
4        A  test  March-2010     30
5        B   foo  March-2010     25
>>> df2 = df2.sort([\"location\", \"name\"])
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
2        A  test    Feb-2010     20
4        A  test  March-2010     30
1        B   foo    Jan-2010     18
3        B   foo    Feb-2010     20
5        B   foo  March-2010     25

(Might want to throw in a .reset_index(drop=True), just to keep the output clean.)

Note: pd.DataFrame.sort has been deprecated in favour of pd.DataFrame.sort_values.



回答2:

I guess I found a simpler solution

temp1 = pd.melt(df1, id_vars=[\"location\"], var_name=\'Date\', value_name=\'Value\')
temp2 = pd.melt(df1, id_vars=[\"name\"], var_name=\'Date\', value_name=\'Value\')

Concat whole temp1 with temp2\'s column name

temp1[\'new_column\'] = temp2[\'name\']

You now have what you asked for.



回答3:

pd.wide_to_long

You can add a prefix to your year columns and then feed directly to pd.wide_to_long. I won\'t pretend this is efficient, but it may in certain situations be more convenient than pd.melt, e.g. when your columns already have an appropriate prefix.

df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f\'Value{x}\')))

res = pd.wide_to_long(df, stubnames=[\'Value\'], i=\'name\', j=\'Date\').reset_index()\\
        .sort_values([\'location\', \'name\'])

print(res)

   name        Date location  Value
0  test    Jan-2010        A     12
2  test    Feb-2010        A     20
4  test  March-2010        A     30
1   foo    Jan-2010        B     18
3   foo    Feb-2010        B     20
5   foo  March-2010        B     25


标签: python pandas