Empty rows in list as NA values in data.frame in R

2019-08-06 14:17发布

问题:

I have a dataframe as follows:

hospital <- c("PROVIDENCE ALASKA MEDICAL CENTER", "ALASKA REGIONAL HOSPITAL", "FAIRBANKS MEMORIAL HOSPITAL", 
          "CRESTWOOD MEDICAL CENTER", "BAPTIST MEDICAL CENTER EAST", "ARKANSAS HEART HOSPITAL", 
          "MEDICAL CENTER NORTH LITTLE ROCK", "CRITTENDEN MEMORIAL HOSPITAL")
state <- c("AK", "AK", "AK", "AL", "AL", "AR", "AR", "AR")
rank <- c(1,2,3,1,2,1,2,3)
df <- data.frame(hospital, state, rank)
df

                                 hospital    state     rank
    1   PROVIDENCE ALASKA MEDICAL CENTER        AK        1
    2   ALASKA REGIONAL HOSPITAL                AK        2
    3   FAIRBANKS MEMORIAL HOSPITAL             AK        3
    4   CRESTWOOD MEDICAL CENTER                AL        1
    5   BAPTIST MEDICAL CENTER EAST             AL        2
    6   ARKANSAS HEART HOSPITAL                 AR        1
    7   MEDICAL CENTER NORTH LITTLE ROCK        AR        2
    8   CRITTENDEN MEMORIAL HOSPITAL            AR        3

I would like to create a function, rankall, that takes rank as an argument and returns the hospitals of that rank for each state, with NAs returned if the state does not have a hospital that matches the given rank. For example, I want output of rankall(rank=3) to look like this:

                           hospital     state 
    AK  FAIRBANKS MEMORIAL HOSPITAL        AK    
    AL                         <NA>        AL
    AR CRITTENDEN MEMORIAL HOSPITAL        AR    

I've tried:

rankall <- function(rank) {
split_by_state <- split(df, df$state)
ranked_hospitals <- lapply(split_by_state, function (x) {
    x[(x$rank==rank), ]
})
combined_ranked_hospitals <- do.call(rbind, ranked_hospitals)
return(combined_ranked_hospitals[ ,1:2])
}

But rankall(rank=3) returns:

                                 hospital     state     
    AK       FAIRBANKS MEMORIAL HOSPITAL         AK                        
    AR       CRITTENDEN MEMORIAL HOSPITAL        AR             

This leaves out the NA values that I need to keep track of. Is there a way for R to recognize the empty rows in my list object within my function as NAs, rather than as empty rows? Is there another function besides lapply that would be more useful for this task?

[ Note: This dataframe is from the Coursera R Programming course. This is also my first post on Stackoverflow, and my first time learning programming. Thank you to all who offered solutions and advice, this forum is fantastic. ]

回答1:

You just need an in/else in your function:

rankall <- function(rank) {
    split_by_state <- split(df, df$state)
    ranked_hospitals <- lapply(split_by_state, function (x) {
        indx <- x$rank==rank
        if(any(indx)){
            return(x[indx, ])
        else{
            out = x[1, ]
            out$hospital = NA
            return(out)
        }
    }
}


回答2:

Here's an alternative approach:

rankall <- function(rank) {  
  do.call(rbind, lapply(split(df, df$state), function(df) { 
    tmp <- df[df$rank == rank, 1:2]   
    if (!nrow(tmp)) return(transform(df[1, 1:2], hospital = NA)) else return(tmp) 
  })) 
}
rankall(3)
#   hospital state
#   AK  FAIRBANKS MEMORIAL HOSPITAL    AK
#   AL                         <NA>    AL
#   AR CRITTENDEN MEMORIAL HOSPITAL    AR


回答3:

Here is another dplyr approach.

fun1 <- function(x) {
            group_by(df, state) %>%
            summarise(hospital = hospital[x],
                      rank = nth(rank, x))
        }

# fun1(3)
#Source: local data frame [3 x 3]
#
#  state                     hospital rank
#1    AK  FAIRBANKS MEMORIAL HOSPITAL    3
#2    AL                           NA   NA
#3    AR CRITTENDEN MEMORIAL HOSPITAL    3


回答4:

I think this is a good use of dplyr. Only thing that's weird is summarize complains when I use NA instead of "NA". Anyone have thoughts on why?

library(dplyr)
rankall <- function(chosen_rank){
  group_by(df, state) %>%
    summarize(hospital = ifelse(length(hospital[rank==chosen_rank])!=0,
                                as.character(hospital[rank==chosen_rank]), "NA"),
              rank = chosen_rank)
}

rankall(1)
rankall(2)
rankall(3)