I'm still learning how to translate a SAS code into R and I get warnings. I need to understand where I'm making mistakes. What I want to do is create a variable which summarizes and differentiates 3 status of a population: mainland, overseas, foreigner. I have a database with 2 variables:
- id nationality:
idnat
(french, foreigner),
If idnat
is french then:
- id birthplace:
idbp
(mainland, colony, overseas)
I want to summarize the info from idnat
and idbp
into a new variable called idnat2
:
- status: k (mainland, overseas, foreigner)
All these variables use "character type".
Results expected in column idnat2 :
idnat idbp idnat2
1 french mainland mainland
2 french colony overseas
3 french overseas overseas
4 foreign foreign foreign
Here is my SAS code I want to translate in R:
if idnat = "french" then do;
if idbp in ("overseas","colony") then idnat2 = "overseas";
else idnat2 = "mainland";
end;
else idnat2 = "foreigner";
run;
Here is my attempt in R:
if(idnat=="french"){
idnat2 <- "mainland"
} else if(idbp=="overseas"|idbp=="colony"){
idnat2 <- "overseas"
} else {
idnat2 <- "foreigner"
}
I receive this warning:
Warning message:
In if (idnat=="french") { :
the condition has length > 1 and only the first element will be used
I was advised to use a "nested ifelse
" instead for its easiness but get more warnings:
idnat2 <- ifelse (idnat=="french", "mainland",
ifelse (idbp=="overseas"|idbp=="colony", "overseas")
)
else (idnat2 <- "foreigner")
According to the Warning message, the length is greater than 1 so only what's between the first brackets will be taken into account. Sorry but I don't understand what this length has to do with here? Anybody know where I'm wrong?
Try something like the following:
Your confusion comes from how SAS and R handle if-else constructions. In R,
if
andelse
are not vectorized, meaning they check whether a single condition is true (i.e.,if("french"=="french")
works) and cannot handle multiple logicals (i.e.,if(c("french","foreigner")=="french")
doesn't work) and R gives you the warning you're receiving.By contrast,
ifelse
is vectorized, so it can take your vectors (aka input variables) and test the logical condition on each of their elements, like you're used to in SAS. An alternative way to wrap your head around this would be to build a loop usingif
andelse
statements (as you've started to do here) but the vectorizedifelse
approach will be more efficient and involve generally less code.