I'm trying add a new column to a data frame based on several conditions from other columns. I have the following data:
> commute <- c("walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry", "walk", "bike", "subway", "drive", "ferry")
> kids <- c("Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Yes")
> distance <- c(1, 12, 5, 25, 7, 2, "", 8, 19, 7, "", 4, 16, 12, 7)
>
> df = data.frame(commute, kids, distance)
> df
commute kids distance
1 walk Yes 1
2 bike Yes 12
3 subway No 5
4 drive No 25
5 ferry Yes 7
6 walk Yes 2
7 bike No
8 subway No 8
9 drive Yes 19
10 ferry Yes 7
11 walk No
12 bike No 4
13 subway Yes 16
14 drive No 12
15 ferry Yes 7
If the following three conditions are met:
commute = walk OR bike OR subway OR ferry
AND
kids = Yes
AND
distance is less than 10
Then I'd like a new column called get.flyer to equal "Yes". The final data frame should look like this:
commute kids distance get.flyer
1 walk Yes 1 Yes
2 bike Yes 12 Yes
3 subway No 5
4 drive No 25
5 ferry Yes 7 Yes
6 walk Yes 2 Yes
7 bike No
8 subway No 8
9 drive Yes 19
10 ferry Yes 7 Yes
11 walk No
12 bike No 4
13 subway Yes 16 Yes
14 drive No 12
15 ferry Yes 7 Yes
We can use
%in%
for comparing multiple elements in a column,&
to check if both conditions are TRUE.It is better to create the
data.frame
withstringsAsFactors=FALSE
as by default it isTRUE
. If we check thestr(df)
, we can find that all the columns arefactor
class. Also, if there are missing values, instead of""
,NA
can be used to avoid converting theclass
of anumeric
column to something else.If we rewrite the creation of 'df'
the above code can be simplified
For better understanding, some people prefer
ifelse
This can be also done easily with
base R
methodsThe solution is already pointed out by @akrun. I'd like to present it in a more 'wrapped up' way.
You can use the
ifelse
statement to create a column based on one (or more) conditions. But first you have to change the 'encoding' of missing values in the distance column. You used""
to indicate a missing value, this however converts the entire column tostring
and inhibits numerical comparison (distance < 10
is not possible). TheR
way of indicating a missing value isNA
, your column definition ofdistance
should be:The
ifelse
statement then looks like this:Optional: Consider encoding your other columns in a different way as well:
TRUE
andFALSE
instead of "Yes" and "No" for thekids
variablefactor
for commuteExample, check if first_column_name is contained in second_column_name and write result to new_column
Details: