可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Can you tell me when to use these vectorization methods with basic examples?
I see that map
is a Series
method whereas the rest are DataFrame
methods. I got confused about apply
and applymap
methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!
回答1:
Straight from Wes McKinney\'s Python for Data Analysis book, pg. 132 (I highly recommended this book):
Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:
In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list(\'bde\'), index=[\'Utah\', \'Ohio\', \'Texas\', \'Oregon\'])
In [117]: frame
Out[117]:
b d e
Utah -0.029638 1.081563 1.280300
Ohio 0.647747 0.831136 -1.549481
Texas 0.513416 -0.884417 0.195343
Oregon -0.485454 -0.477388 -0.309548
In [118]: f = lambda x: x.max() - x.min()
In [119]: frame.apply(f)
Out[119]:
b 1.133201
d 1.965980
e 2.829781
dtype: float64
Many of the most common array statistics (like sum and mean) are DataFrame methods,
so using apply is not necessary.
Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:
In [120]: format = lambda x: \'%.2f\' % x
In [121]: frame.applymap(format)
Out[121]:
b d e
Utah -0.03 1.08 1.28
Ohio 0.65 0.83 -1.55
Texas 0.51 -0.88 0.20
Oregon -0.49 -0.48 -0.31
The reason for the name applymap is that Series has a map method for applying an element-wise function:
In [122]: frame[\'e\'].map(format)
Out[122]:
Utah 1.28
Ohio -1.55
Texas 0.20
Oregon -0.31
Name: e, dtype: object
Summing up, apply
works on a row / column basis of a DataFrame, applymap
works element-wise on a DataFrame, and map
works element-wise on a Series.
回答2:
There\'s great information in these answers, but I\'m adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don\'t have the rep to comment.
DataFrame.apply
operates on entire rows or columns at a time.
DataFrame.applymap
, Series.apply
, and Series.map
operate on one
element at time.
There is a lot of overlap between the capabilities of Series.apply
and Series.map
, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa\'s answer.
回答3:
Adding to the other answers, in a Series
there are also map and apply.
Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.
In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0 1
1 2
2 3
dtype: int64
In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]:
0 1
0 1 1
1 2 2
2 3 3
In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]:
0 0 1
1 1
dtype: int64
1 0 2
1 2
dtype: int64
2 0 3
1 3
dtype: int64
dtype: object
Also if I had a function with side effects, such as \"connect to a web server\", I\'d probably use apply
just for the sake of clarity.
series.apply(download_file_for_every_element)
Map
can use not only a function, but also a dictionary or another series. Let\'s say you want to manipulate permutations.
Take
1 2 3 4 5
2 1 4 5 3
The square of this permutation is
1 2 3 4 5
1 2 5 3 4
You can compute it using map
. Not sure if self-application is documented, but it works in 0.15.1
.
In [39]: p=pd.Series([1,0,3,4,2])
In [40]: p.map(p)
Out[40]:
0 0
1 1
2 4
3 2
4 3
dtype: int64
回答4:
@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation....
frame.apply(np.sqrt)
Out[102]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
frame.applymap(np.sqrt)
Out[103]:
b d e
Utah NaN 1.435159 NaN
Ohio 1.098164 0.510594 0.729748
Texas NaN 0.456436 0.697337
Oregon 0.359079 NaN NaN
回答5:
Just wanted to point out, as I struggled with this for a bit
def f(x):
if x < 0:
x = 0
elif x > 100000:
x = 100000
return x
df.applymap(f)
df.describe()
this does not modify the dataframe itself, has to be reassigned
df = df.applymap(f)
df.describe()
回答6:
Probably simplest explanation the difference between apply and applymap:
apply takes the whole column as a parameter and then assign the result to this column
applymap takes the separate cell value as a parameter and assign the result back to this cell.
NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.
回答7:
My understanding:
From the function point of view:
If the function has variables that need to compare within a column/ row, use
apply
.
e.g.: lambda x: x.max()-x.mean()
.
If the function is to be applied to each element:
1> If a column/row is located, use apply
2> If apply to entire dataframe, use applymap
majority = lambda x : x > 17
df2[\'legal_drinker\'] = df2[\'age\'].apply(majority)
def times10(x):
if type(x) is int:
x *= 10
return x
df2.applymap(times10)