So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
location name Jan-2010 Feb-2010 March-2010
A \"test\" 12 20 30
B \"foo\" 18 20 25
What I would like is for it to look like
location name Date Value
A \"test\" Jan-2010 12
A \"test\" Feb-2010 20
A \"test\" March-2010 30
B \"foo\" Jan-2010 18
B \"foo\" Feb-2010 20
B \"foo\" March-2010 25
problem is I don\'t know how many dates are in the column (though I know they will always start after name)
You can use pd.melt
to get most of the way there, and then sort:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df, id_vars=[\"location\", \"name\"],
var_name=\"Date\", value_name=\"Value\")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort([\"location\", \"name\"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
I guess I found a simpler solution
temp1 = pd.melt(df1, id_vars=[\"location\"], var_name=\'Date\', value_name=\'Value\')
temp2 = pd.melt(df1, id_vars=[\"name\"], var_name=\'Date\', value_name=\'Value\')
Concat whole temp1
with temp2
\'s column name
temp1[\'new_column\'] = temp2[\'name\']
You now have what you asked for.
pd.wide_to_long
You can add a prefix to your year columns and then feed directly to pd.wide_to_long
. I won\'t pretend this is efficient, but it may in certain situations be more convenient than pd.melt
, e.g. when your columns already have an appropriate prefix.
df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f\'Value{x}\')))
res = pd.wide_to_long(df, stubnames=[\'Value\'], i=\'name\', j=\'Date\').reset_index()\\
.sort_values([\'location\', \'name\'])
print(res)
name Date location Value
0 test Jan-2010 A 12
2 test Feb-2010 A 20
4 test March-2010 A 30
1 foo Jan-2010 B 18
3 foo Feb-2010 B 20
5 foo March-2010 B 25