I'm loading a csv file, which has the following columns: date, textA, textB, numberA, numberB
I want to group by the columns: date, textA and textB - but want to apply "sum" to numberA, but "min" to numberB.
data = pd.read_table("file.csv", sep=",", thousands=',')
grouped = data.groupby(["date", "textA", "textB"], as_index=False)
...but I cannot see how to then apply two different aggregate functions, to two different columns?
I.e. sum(numberA), min(numberB)
The
agg
method can accept a dict, in which case the keys indicate the column to which the function is applied:For example,
yields
This also shows that Pandas can handle spaces in column names. I'm not sure what the origin of the problem was, but literal spaces should not have posed a problem. If you wish to investigate this further,
without reassigning the column names, will show show us the
repr
of the names. Maybe there was a hard-to-see character in the column name that looked like a space (or some other character) but was actually au'\xa0'
(NO-BREAK SPACE), for example.