replace loops with apply family functions (or dply

2020-04-30 19:27发布

I have created this representative data frame that assigns condition categories using a for loop.

df <- data.frame(Date=c("08/29/2011", "08/29/2011", "08/30/2011", "08/30/2011", "08/30/2011", "08/29/2012", "08/29/2012", "01/15/2012", "08/29/2012"),
             Time=c("09:45", "10:00", "13:00", "13:30", "10:14", "9:09", "11:23", "17:06", "12:20"),
             Diff = c(0.2,4.3,6.5,15.0, 16.5, 31, 30.2, 21.9, 1.9))

df1<- df %>%
  mutate(Accuracy=ifelse(Diff<=3, "Excellent", "TBD"))

for(i in 1:nrow(df1)){
  if(df1$Diff[i]>3&&df1$Diff[i]<=10){
    df1$Accuracy[i]<-"Good"} 
  if(df1$Diff[i]>10&&df1$Diff[i]<=15){
    df1$Accuracy[i]<-"Fair"} 
  if(df1$Diff[i]>15&&df1$Diff[i]<=30){
    df1$Accuracy[i]<-"Poor"}
  if(df1$Diff[i]>30){
    df1$Accuracy[i]<-"Unacceptable"}
}

My actual dataset is very large and reading indicates for loops are usually not the most efficient way to code in R. I believe I can do the same thing by creating a logical vector for each condition, and within each vector TRUE is when each condition is met. Then, I can assign the values by subsetting, df1$Accuracy[Good]<-"Good" for example. However, I can not figure out how to create the logical vector using the apply family functions or dplyr functions. (But, any solution that avoids for loops is also welcome.) If for loops are the better way to go, that would also be helpful to know.

Here are my failed attempts. These return incorrect NA's or incorrect logical vectors. One of the many things I do not understand is how lapply knows to go over columns or rows.

Good<-apply(df1, 1, function(x) ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE)) #logical, TRUE where condition is true 
Good<-unlist(lapply(df1$Diff,  function(x) {(ifelse(df1$Diff[x]>3&& df1$Diff[x]<=10, TRUE, FALSE))}))

Update: Nested ifelse statements will work, but any suggestions on how to use apply are still welcome.

mutate(Accuracy=ifelse(pDiff<=3, "Excellent", 
                         ifelse(pDiff>3&pDiff<=10, "Good",
                                ifelse(pDiff>10&pDiff<=15, "Fair",
                                       ifelse(pDiff>15&pDiff<30, "Poor",
                                              ifelse(Diff>30, "Unpublishable", "TBD"))))))  

1条回答
爷、活的狠高调
2楼-- · 2020-04-30 20:01

You could use case_when from dplyr:

df1<- df %>%
mutate(Accuracy= case_when(
  .$Diff <=  3 ~ "Excellent",
  .$Diff <=  10  ~ "Good",
  .$Diff <=  15  ~ "Fair",
  .$Diff <=  30  ~ "Poor",
  .$Diff >   30  ~ "Unpublishable",
  TRUE  ~"TBD")
)

 df1
        Date  Time Diff      Accuracy
1 08/29/2011 09:45  0.2     Excellent
2 08/29/2011 10:00  4.3          Good
3 08/30/2011 13:00  6.5          Good
4 08/30/2011 13:30 15.0          Fair
5 08/30/2011 10:14 16.5          Poor
6 08/29/2012  9:09 31.0 Unpublishable
7 08/29/2012 11:23 30.2 Unpublishable
8 01/15/2012 17:06 21.9          Poor
9 08/29/2012 12:20  1.9     Excellent
查看更多
登录 后发表回答