Error handling in a loop that outputs a dataframe

2019-09-14 20:15发布

Problem

I am trying to write a loop in R where I am anticipating some errors. Instead of stopping the loop, I am trying to figure out a way to capture the error message and include that information in the output and continue the loop.

NOTE about data: This data is from the NOAA website and it is the Southern Oscillation Index data. The second set is trivial (data2) and is presented solely to generate an error.

Here is a slightly trivial example of the type of loop I am trying to create. Take some data from the web, perform some manipulations on it, store it (as df) then get more data, perform the same manipulation and append it (via rbind) to the first data:

data_spec <- c("data")
df <- c()
for (i in data_spec){
  raw <- read.csv(
    paste0("https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/",i,".csv"),
           skip = 2, col.names = c("Date","SOI") )
  u <- data.frame(data_spec = i, mean_soi = mean(raw$SOI))
  df <- rbind(df, u)
}

Because https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/data2.csv is not a valid url, the loop stops and throws an error:

Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open URL 'https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/data2.csv': HTTP status was '404 Not Found'

Expected Output

I am trying to achieve an output like this where the error message is captured as object and appended accordingly:

  data_spec                                             mean_soi
1      data                                            0.1223618
2     data2 Error in file(file, rt) : cannot open the connection

Attempts to figure this out

So I think I am clear that I need to use tryCatch here. If I use it like so:

data_spec <- c("data", "data2")
df <- c()
for (i in data_spec){
  tryCatch({
  raw <- read.csv(
    paste0("https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/",i,".csv"),
    skip = 2, col.names = c("Date","SOI") )
  u <- data.frame(data_spec = i, mean_soi = mean(raw$SOI))
  df <- rbind(df, u)
  }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}

the loop continues along, but the error message isn't captured in the output (not that I was expecting this here).

Another option is to use the function outputted from: demo(error.catching). I modified that function a little so that the error message is captured:

tryCatch_mod <- function(expr)
{
  W <- NULL
  w.handler <- function(w){ # warning handler
    W <<- w
    invokeRestart("muffleWarning")
  }
  temp <- list(value = withCallingHandlers(tryCatch(expr, error = function(e) e),
                                   warning = w.handler),
       warning = W)

  unlist(temp[[2]])$message
}

This outputs an error when using "data2":

tryCatch_mod(read.csv("https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/data2.csv",
                      skip = 2, col.names = c("Date","SOI")))

What I can't figure out

How do I include this function (or something that accomplishes the same thing) so that the output is conditional or whether there is a error? So that is, how do I write my function that it essentially says:

  • When there is an error, skip any manipulations and append i and the error message into df
  • When there is NOT an error, perform the manipulation and append results onto the same df

1条回答
仙女界的扛把子
2楼-- · 2019-09-14 20:31

I recreate your data.frame to include one additional column to store error messages, because in R data.frame, one column should store one type of things. It will be a mess to store error messages and other things in one column, and their type will be coerced to be the same. The handle_i function is the function you want to carry out on each i. The error-handling is the tryCatch line, it returns the error message you want to store whenever error happens (but it doesn't do anything about warning). And finally it will store the things correspondingly to the data.frame.

handle_i <- function(i){
    raw <- read.csv(
        paste0("https://www.ncdc.noaa.gov/teleconnections/enso/indicators/soi/",i,".csv"),
        skip = 2, col.names = c("Date","SOI") )
    list(mean_soi = mean(raw$SOI))
}

data_spec <- c("data", "data2")
df <- data.frame(data_spec = data_spec, mean_soi = NA, message = "", stringsAsFactors = FALSE)
for (i in 1:length(data_spec)) {
    r <- tryCatch(handle_i(data_spec[i]), error = function(e) list(message = e$message))
    df[i, names(r)] <- r
}
查看更多
登录 后发表回答