Can you tell me when to use these vectorization methods with basic examples?
I see that map
is a Series
method whereas the rest are DataFrame
methods. I got confused about apply
and applymap
methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
Straight from Wes McKinney's Python for Data Analysis book, pg. 132 (I highly recommended this book):
Summing up,
apply
works on a row / column basis of a DataFrame,applymap
works element-wise on a DataFrame, andmap
works element-wise on a Series.There's great information in these answers, but I'm adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don't have the rep to comment.
DataFrame.apply
operates on entire rows or columns at a time.DataFrame.applymap
,Series.apply
, andSeries.map
operate on one element at time.There is a lot of overlap between the capabilities of
Series.apply
andSeries.map
, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa's answer.Probably simplest explanation the difference between apply and applymap:
apply takes the whole column as a parameter and then assign the result to this column
applymap takes the separate cell value as a parameter and assign the result back to this cell.
NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.
Adding to the other answers, in a
Series
there are also map and apply.Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.
Also if I had a function with side effects, such as "connect to a web server", I'd probably use
apply
just for the sake of clarity.Map
can use not only a function, but also a dictionary or another series. Let's say you want to manipulate permutations.Take
The square of this permutation is
You can compute it using
map
. Not sure if self-application is documented, but it works in0.15.1
.@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....
My understanding:
From the function point of view:
If the function has variables that need to compare within a column/ row, use
apply
.e.g.:
lambda x: x.max()-x.mean()
.If the function is to be applied to each element:
1> If a column/row is located, use
apply
2> If apply to entire dataframe, use
applymap