I want to count how many consistent increase, and the difference between the first element and the last element, on a groupby. But I can't apply the function on the groupby. After groupby, is it a list? And also what's the difference between "apply" and "agg"? Sorry, I just touched the python for a few days.
def promotion(ls):
pro =0
if len(ls)>1:
for j in range(1,len(ls)):
if ls[j]>ls[j-1]:
pro + = 1
return pro
def growth(ls):
head= ls[0]
tail= ls[len(ls)-1]
gro= tail-head
return gro
titlePromotion= JobData.groupby("candidate_id")["TitleLevel"].apply(promotion)
titleGrowth= JobData.groupby("candidate_id")["TitleLevel"].apply(growth)
The data is:
candidate_id TitleLevel othercols
1 2 foo
2 1 bar
2 2 goo
2 1 gar
The result should be
titlePromotion
candidate_id
1 0
2 1
titleGrowth
candidate_id
1 0
2 0
you could jus use lambda in apply like that:
the code below would substract all values from the first one
if you try that with agg:
it won't work because agg needs to get one value for each group
if you subtract values of a particular cells, there's no difference between agg and apply, they both create a one value for each group
if you would like for example substract each row value from the previous row (to get the increment for each row), you could use transform like that:
or more easily, for this particular problem:
Some tips:
If you define the generic function
and call
Python will print
This is a low-brow but effective way to discover that calling
jobData.groupby(...)[...].apply(foo)
passes aSeries
tofoo
.The
apply
method callsfoo
once for every group. It can return a Series or a DataFrame with the resulting chunks glued together. It is possible to useapply
whenfoo
returns an object such as a numerical value or string, but in such cases I think usingagg
is preferred. A typical use case for usingapply
is when you want to, say, square every value in a group and thus need to return a new group of the same shape.The
transform
method is also useful in this situation -- when you want to transform every value in the group and thus need to return something of the same shape -- but the result can be different than that withapply
since a different object may be passed tofoo
(for example, each column of a grouped dataframe would be passed tofoo
when usingtransform
, while the entire group would be passed tofoo
when usingapply
. The easiest way to understand this is to experiment with a simple dataframe and the genericfoo
.)The
agg
method callsfoo
once for every group, but unlikeapply
it should return a single number per group. The group is aggregated into a value. A typical use case for usingagg
is when you want to count the number of items in the group.You can debug and understand what went wrong with your original code by using the generic
foo
function:This shows you the Series that are being passed to
foo
. Notice that in the second Series, then index values are 1 and 2. Sols[0]
raises aKeyError
, since there is no label with value0
in the second Series.What you really want is the first item in the Series. That is what
iloc
is for.So to summarize, use
ls[label]
to select the row of a Series with index value oflabel
. Usels.iloc[n]
to select then
th row of the Series.Thus, to fix your code with a the least amount of change, you could use