In the process of debugging and solving my problem with retrieving attributes (Can I access R data objects' attributes without fully loading objects from file?), based on advice here on SO, I switched from using save()
and load()
to saveRDS()
and readRDS()
, correspondingly.
My investigation (via non-interactive debug printing) showed the following:
immediately after initial
saveRDS()
the saved object contains the attribute in question;an interactive R session, performed after the initial run of the script, show the absence of the attribute from the saved object;
the previous findings above explain the failure to retrieve the said attribute during the next run of the script, which I initially incorrectly attributed to
save/load
andsaveRDS/readRDS
behavior.
In order to manually confirm the presence of the attribute in the persistent object (saved in an .rds
file) immediately after the initial saveRDS
, I decided to pause the batch R script running in one terminal window using scan
(readLine
doesn't appear to work for this in batch R scripts):
if (DEBUG) {
cat("Press [Enter] to continue")
key <- scan("stdin", character(), n=1)
}
and, in another terminal window, to inspect the saved object via an interactive R session.
However, when, after the batch script has stopped as expected, loading the saved object from the .rds
file in an interactive session failed with the following message:
> load("../cache/SourceForge/ZGV2TGlua3M=.rds")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘ZGV2TGlua3M=.rds’ has magic number 'X'
Use of save versions prior to 2 is deprecated
The following output describes my R environment at the time of investigation:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
The only plausible to me explanation is that the batch session (and, specifically, the pause via scan
) somehow locks or modifies the environment that makes it impossible to properly access R objects from within the interactive session. Perhaps there exist other possible reasons for this situation. I would greatly appreciate any help or advice to solve this problem!
UPDATE:
After killing the batch R script's process (which after scan
became unresponsive), I again tried to manually load the .rds
file, expecting a success due to the absence of the pause in the batch script. However, to my surprise, I was greeted with the exact same error message. This makes me think that the .rds
file is really corrupted (potentially due to my practice of stopping a running batch R script by repeatedly pressing Ctrl-C
- I will need to come up with something more "gentle"). After figuring out a better way to stop a running script, I will try to reproduce the scenario and report here.
UPDATE 2:
After removing all (potentially corrupted) .rds
files from the cache directory and following the scenario described above (loading R data file interactively with batch R script paused), the output presented exactly the same error message as before. At this point, I really need an advice to figure out what's going on.
UPADATE 3 (saving the object):
assign(dataName, srdaGetData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
UPDATE 4 (reproducible example):
library(RCurl)
info <- "Important data"
request <- "SELECT info FROM topSecret"
dataName <- "sf.data.devLinks"
rdataFile <- "/tmp/testAttr.rds"
getData <- function() {
return (info)
}
requestDigest <- base64(request)
# check if the archive file has already been processed
message("\nProcessing request \"", request, "\" ...\n")
# read back the object with the attribute
if (file.exists(rdataFile)) {
# now check if request's SQL query hasn't been modified
data <- readRDS(rdataFile)
message("Retrieved object '", as.name(data), "', containing:\n")
message(toString(data))
requestAttrib <- attr(data, "SQL", exact = TRUE)
message("\nObject '", data, "' contains attribute:\n\"",
base64(requestAttrib), "\"\n")
if (identical(requestDigest, requestAttrib)) {
message("Processing skipped: RDS file is up-to-date.\n")
stop()
}
rm(data)
}
message("Saving results of request \"",
request, "\" as R data object ...\n")
assign(dataName, getData())
data <- as.name(dataName)
# save hash of the request's SQL query as data object's attribute,
# so that we can detect when configuration contains modified query
attr(data, "SQL") <- base64(request)
# save current data frame to RDS file
saveRDS(data, rdataFile)
I expect the value of dataName
variable to be saved, however the code saves the name of the variable.
If you save something using
saveRDS
, the equivalentloading
function isreadRDS
/ If yousave
an object into anRData
file, you should useload
to load the object.readRDS
will allow you to specify the name of the object being loaded.load
loads theobjects
in an.RData
file, and they will retain the names with which they were saved.If
"../cache/SourceForge/ZGV2TGlua3M=.rds"
was saved usingsaveRDS
, thenwill load the object as
whatever
Running
load
on a file not saved in.RData
format will result in the error message you posted.