I must be very confused. Have looked around but cannot find a suitable answer and have a feeling I am doing something wrong.
Here is a minimalist example:
My function test
import a file from a folder and does subsequent analysis on that file. I have dozens of compressed files in the folder specified by path = "inst/extdata/input_data"
test = structure(function(path,letter) {
file = paste0(path, "/file_",letter,".tsv.gz")
data = read.csv(file,sep="\t",header=F,quote="\"",stringsAsFactors=F)
return(mean(data$var1))
}, ex = function(){
path = "inst/extdata/input_data"
m1 = test(path,"A")
})
I am building a package with the function in the folder R/
of the package directory.
When I set the working directory to the package parent and run the example line by line, everything goes fine. However when I check the package with R CMD check
it gives me the following:
cannot open file 'inst/extdata/input_data/file_A.tsv.gz': No such file or directory
Error in file(file, "rt") : cannot open the connection
I thought in checking and building the package the working directory is automatically set to the parent directory of the package (that in my case is "C:/Users/yuhu/R/Projects/ABCDpackage"
but it seems not to be the case.
What is the best practice in this case? I would avoid converting all data in .rda
format and put it in the data
folder as there are too many files. Is there a way to compile the package and set in the function example the relative working directory where the package is located? This would be helpful also when the package is distributed (therefore it should not be my own path)
Many thanks for your help.
When R CMD check (or the user later for that matter) runs the example, you need to provide the full path to the file! You can build that path easily with the system.file
or the path.package
command.
If your package is called foo, the following should do the trick:
}, ex = function(){
path = paste0(system.file(package = "foo"), "/extdata/input_data")
m1 = test(path,"A")
})
You might want to add a file.path command somewhere to be OS independent.
Since read.csv is just a wrapper for read.table I would not expect any fundamental difference w.r.t. to reading compressed files.
Comment: R removes the "inst/" part of the directory when it builds the system directory. This thread has a discussion on the inst directory
I think you might just want to go with read.table... At any rate give this a try.
fopen <- file(paste0(path,"/file_",letter,".tsv.gz"),open="rt")
data <- read.table(fopen,sep="\t",header=F,quote="\"",stringsAsFactors=F)
Refinement:
At the end of the day I think your problem is mainly because you are using read.csv instead of read.table which can open up .gz zipped files directly. So just to be sure. Here is a little experiment I did.
Experiment:
# zip up a .csv file (in this case example_A.csv) that exists in my working directory into .gz format
system("gzip example_A.csv")
# just wanted to pass the path as a variable like you did
path <- getwd()
file <- paste0(path, "/example_", "A", ".csv.gz")
data <- read.table(file, sep=",", header=FALSE, stringsAsFactors=FALSE) # I think
# these are the only options you need.
# stringsAsFactors=FALSE is agood one.
data <- data[1:5,1:7] # a subset of the data
V1 V2 V3 V4 V5 V6 V7
1 id Scenario Region Fuel X2005 X2010 X2015
2 1 BSE9VOG4 R1 Biomass 0 2.2986 0.8306
3 2 BSE9VOG4 R1 Coal 7.4339 13.3548 9.2918
4 3 BSE9VOG4 R1 Gas 1.9918 2.4623 2.5558
5 4 BSE9VOG4 R1 LFG 0.2111 0.2111 0.2111
At the end of the day (I say that too much) you can be certain that the problem is in either the method you used to read the zipped up files or the text string you've constructed for the file names (haven't looked into the latter). At any rate best of luck with the package. I hope it turns tides.