I have a dataframe:
df <- data.frame('a'=c(1,2,3,4,5), 'b'=c(1,20,3,4,50))
df
a b
1 1 1
2 2 20
3 3 3
4 4 4
5 5 50
and I want to create a new column based on existing columns. Something like this:
if (df[['a']] == df[['b']]) {
df[['c']] <- df[['a']] + df[['b']]
} else {
df[['c']] <- df[['b']] - df[['a']]
}
The problem is that the if
condition is checked only for the first row... If I create a function from the above if
statement then I use apply()
(or mapply()
...), it is the same.
In Python/pandas I can use this:
df['c'] = df[['a', 'b']].apply(lambda x: x['a'] + x['b'] if (x['a'] == x['b']) \
else x['b'] - x['a'], axis=1)
I want something similar in R. So the result should look like this:
a b c
1 1 1 2
2 2 20 18
3 3 3 6
4 4 4 8
5 5 50 45
Here is a slightly more confusing algebraic method:
The idea is that the "minus" operator is turned on or off based on the test
a==b
.A solution with
apply
One option is
ifelse
which is vectorized version ofif/else
. If we are doing this for each row, theif/else
as showed in the OP's pandas post can be done in either afor
loop orlapply/sapply
, but that would be inefficient inR
.This can be otherwise written as
to create the 'c' column in the original dataset
As the OP wants a similar option in
R
usingif/else
If you want an apply method, then another way with
mapply
would be create a function and apply it,Using dplyr package: