Error when trying to interactively load data file

2019-09-02 05:15发布

问题:

In the process of debugging and solving my problem with retrieving attributes (Can I access R data objects' attributes without fully loading objects from file?), based on advice here on SO, I switched from using save() and load() to saveRDS() and readRDS(), correspondingly.

My investigation (via non-interactive debug printing) showed the following:

  1. immediately after initial saveRDS() the saved object contains the attribute in question;

  2. an interactive R session, performed after the initial run of the script, show the absence of the attribute from the saved object;

  3. the previous findings above explain the failure to retrieve the said attribute during the next run of the script, which I initially incorrectly attributed to save/load and saveRDS/readRDS behavior.

In order to manually confirm the presence of the attribute in the persistent object (saved in an .rds file) immediately after the initial saveRDS, I decided to pause the batch R script running in one terminal window using scan (readLine doesn't appear to work for this in batch R scripts):

if (DEBUG) {
  cat("Press [Enter] to continue")
  key <- scan("stdin", character(), n=1)
}

and, in another terminal window, to inspect the saved object via an interactive R session.

However, when, after the batch script has stopped as expected, loading the saved object from the .rds file in an interactive session failed with the following message:

> load("../cache/SourceForge/ZGV2TGlua3M=.rds")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘ZGV2TGlua3M=.rds’ has magic number 'X'
  Use of save versions prior to 2 is deprecated

The following output describes my R environment at the time of investigation:

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

The only plausible to me explanation is that the batch session (and, specifically, the pause via scan) somehow locks or modifies the environment that makes it impossible to properly access R objects from within the interactive session. Perhaps there exist other possible reasons for this situation. I would greatly appreciate any help or advice to solve this problem!

UPDATE:

After killing the batch R script's process (which after scan became unresponsive), I again tried to manually load the .rds file, expecting a success due to the absence of the pause in the batch script. However, to my surprise, I was greeted with the exact same error message. This makes me think that the .rds file is really corrupted (potentially due to my practice of stopping a running batch R script by repeatedly pressing Ctrl-C - I will need to come up with something more "gentle"). After figuring out a better way to stop a running script, I will try to reproduce the scenario and report here.

UPDATE 2:

After removing all (potentially corrupted) .rds files from the cache directory and following the scenario described above (loading R data file interactively with batch R script paused), the output presented exactly the same error message as before. At this point, I really need an advice to figure out what's going on.

UPADATE 3 (saving the object):

assign(dataName, srdaGetData())
data <- as.name(dataName)

# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)

# save current data frame to RDS file
saveRDS(data, rdataFile)

UPDATE 4 (reproducible example):

library(RCurl)

info <- "Important data"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"

getData <- function() {
  return (info)
}

requestDigest <- base64(request)

# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")

# read back the object with the attribute
if (file.exists(rdataFile)) {
  # now check if request's SQL query hasn't been modified
  data <- readRDS(rdataFile)
  message("Retrieved object '", as.name(data), "', containing:\n")
  message(toString(data))

  requestAttrib <- attr(data, "SQL", exact = TRUE)
  message("\nObject '", data, "' contains attribute:\n\"",
                 base64(requestAttrib), "\"\n")

  if (identical(requestDigest, requestAttrib)) {
    message("Processing skipped: RDS file is up-to-date.\n")
    stop()
  }
  rm(data)
}

message("Saving results of request \"",
        request, "\" as R data object ...\n")

assign(dataName, getData())
data <- as.name(dataName)

# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)

# save current data frame to RDS file
saveRDS(data, rdataFile)

I expect the value of dataName variable to be saved, however the code saves the name of the variable.

回答1:

If you save something using saveRDS, the equivalent loading function is readRDS/ If you save an object into an RData file, you should use load to load the object.

readRDS will allow you to specify the name of the object being loaded.

load loads the objects in an .RData file, and they will retain the names with which they were saved.

If "../cache/SourceForge/ZGV2TGlua3M=.rds" was saved using saveRDS, then

whatever <- readRDS("../cache/SourceForge/ZGV2TGlua3M=.rds")

will load the object as whatever

Running load on a file not saved in .RData format will result in the error message you posted.