How to prevent truncation of error messages in R

2019-02-17 00:30发布

问题:

I am querying a database in R using RJDBC. The queries are built up from data which is read in from a file. These queries can get very long, and can potentially include non existent columns (resulting in an error).

Below is a simplified example, it takes the file as input and the runs 2 queries generated from the file.

table     column
drinks    cost
drinks    sugar
drinks    volume
food      cost
SELECT column, cost, sugar FROM drinks;
SELECT cost FROM food;

Because these queries can get very long, any errors from the database are often truncated before the useful information. One of my current errors reads:

ERROR [2018-05-16 16:53:07] Error processing table data_baseline_biosamples for DAR-2018-00008 original error message: Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for ", : Unable to retrieve JDBC result set for SELECT ed.studyid, {very long list of columns} ,ct.nmr_xl_vldl_pl,ct.nmr_xl_

Because the database error includes the entire query before the key information, the truncation removes valuable information for solving the problem.

In this case the error message probably ends with something like this:

(line 1, Table 'data_biosamples' owned by 'littlefeltfangs' does not contain column 'sample_source'.)

How to I record the full error message sent by the database or otherwise extract the final part of that message?

I am capturing the error in a tryCatch and passing the error into a log file using futile.logger. The total error length when truncated is 8219 characters, with 8190 of those appearing to be from the database.

回答1:

It's not RJDBC that's cutting off the error message.

See ?stop:

Errors will be truncated to getOption("warning.length") characters, default 1000.

So you can set the option:

stop(paste(rep(letters, 50L), collapse = ''))
options(warning.length = 2000L)
stop(paste(rep(letters, 50L), collapse = ''))

You'll notice the truncation in the first message, but no the second.

For my own helper functions catching errors from RDJBC, I use something like:

result = tryCatch(<some DB operation>, error = identity)

Then do regular expressions on result$message to test for various common errors & produce a friendlier error message.


Not mentioned in ?stop is that warning.length can only be in a fairly narrow range of values. To explore this I ran the following code:

can = logical(16000L)
for (ii in seq_along(can)) {
  res = tryCatch(options(warning.length = ii),
                 error = identity)
  if (inherits(res, 'error')) {
    can[ii] = FALSE
  } else can[ii] = TRUE
}

png('~/Desktop/warning_valid.png')
plot(can, las = 1L, ylab = 'Valid option value?',
     main = 'Valid option values for `warning.length`',
     type = 's', lwd = 3L, log = 'x')
first = which.max(can)
switches = c(first, first + which.min(can[first:length(can)] - 1L))
abline(v = switches, lty = 2L, col = 'red', lwd = 2L)
axis(side = 1L, at = switches, las = 2L, cex = .5)
dev.off()

Beats me where these numbers (100 & 8172) come from, they seem fairly arbitrary (8196 is the nearest power of 2). Here is the place in the R source where these values are hard-coded in. I've asked about this on r-devel; I'll update this post accordingly.

FWIW, in my own error-parsing helper function (built for querying PrestoDB), I have this line:

core_msg = gsub('.*(Query failed.*)\\)\\s*$', '\\1', result$message)

This is catered to the error messages that come out of PrestoDB, so you'll have to customize it yourself, but the idea is to clip out that part of your error message which is just regurgitating the query itself.

Alternatively, of course you can split result$message into two bits which are less than 8172 characters and print them out separately.



回答2:

While not a solution for the general case, the solution to my specific case was to move from using the RJDBC package to the odbc package (not the RODBC package). Both are based on DBI, which means that switching should be as simple as installing an ODBC driver and replacing your dbConnect parameters. Error messages produced by the odbc package do not include the original query, so do not run into the truncation issue I was struggling with.

For comparison, this is the complete set of changes I've needed to make:

Original:

request_settings[['db_con']]<-dbConnect(global_settings$ingresJDBC,url="jdbc:ingres://localhost:IJ7/myvnode::mydatabase;")

New:

request_settings[['db_con']]<-dbConnect(odbc::odbc(),driver="Ingres",server="myvnode",database="mydatabase")

The error messages are much more compact. E.g.,

Error in new_result(connection@ptr, statement): nanodbc/nanodbc.cpp:1344: 42501: [Actian][Ingres ODBC Driver][Ingres]line 1, Table 'mytable' owned by 'littlefeltfangs' does not contain column 'mycolumn'.

The documentation for the odbc package (what there is of it) can be found here.