In the process of debugging and solving my problem with retrieving attributes (Can I access R data objects' attributes without fully loading objects from file?), based on advice here on SO, I switched from using save()
and load()
to saveRDS()
and readRDS()
, correspondingly.
My investigation (via non-interactive debug printing) showed the following:
immediately after initial
saveRDS()
the saved object contains the attribute in question;an interactive R session, performed after the initial run of the script, show the absence of the attribute from the saved object;
the previous findings above explain the failure to retrieve the said attribute during the next run of the script, which I initially incorrectly attributed to
save/load
andsaveRDS/readRDS
behavior.
In order to manually confirm the presence of the attribute in the persistent object (saved in an .rds
file) immediately after the initial saveRDS
, I decided to pause the batch R script running in one terminal window using scan
(readLine
doesn't appear to work for this in batch R scripts):
if (DEBUG) {
cat("Press [Enter] to continue")
key <- scan("stdin", character(), n=1)
}
and, in another terminal window, to inspect the saved object via an interactive R session.
However, when, after the batch script has stopped as expected, loading the saved object from the .rds
file in an interactive session failed with the following message:
> load("../cache/SourceForge/ZGV2TGlua3M=.rds")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘ZGV2TGlua3M=.rds’ has magic number 'X'
Use of save versions prior to 2 is deprecated
The following output describes my R environment at the time of investigation:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
The only plausible to me explanation is that the batch session (and, specifically, the pause via scan
) somehow locks or modifies the environment that makes it impossible to properly access R objects from within the interactive session. Perhaps there exist other possible reasons for this situation. I would greatly appreciate any help or advice to solve this problem!
UPDATE:
After killing the batch R script's process (which after scan
became unresponsive), I again tried to manually load the .rds
file, expecting a success due to the absence of the pause in the batch script. However, to my surprise, I was greeted with the exact same error message. This makes me think that the .rds
file is really corrupted (potentially due to my practice of stopping a running batch R script by repeatedly pressing Ctrl-C
- I will need to come up with something more "gentle"). After figuring out a better way to stop a running script, I will try to reproduce the scenario and report here.
UPDATE 2:
After removing all (potentially corrupted) .rds
files from the cache directory and following the scenario described above (loading R data file interactively with batch R script paused), the output presented exactly the same error message as before. At this point, I really need an advice to figure out what's going on.
UPADATE 3 (saving the object):
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
UPDATE 4 (reproducible example):
library(RCurl)
info <- "Important data"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"
getData <- function() {
return (info)
}
requestDigest <- base64(request)
# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")
# read back the object with the attribute
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- readRDS(rdataFile)
message("Retrieved object '", as.name(data), "', containing:\n")
message(toString(data))
requestAttrib <- attr(data, "SQL", exact = TRUE)
message("\nObject '", data, "' contains attribute:\n\"",
base64(requestAttrib), "\"\n")
if (identical(requestDigest, requestAttrib)) {
message("Processing skipped: RDS file is up-to-date.\n")
stop()
}
rm(data)
}
message("Saving results of request \"",
request, "\" as R data object ...\n")
assign(dataName, getData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
I expect the value of dataName
variable to be saved, however the code saves the name of the variable.