I must be very confused. Have looked around but cannot find a suitable answer and have a feeling I am doing something wrong.
Here is a minimalist example:
My function test
import a file from a folder and does subsequent analysis on that file. I have dozens of compressed files in the folder specified by path = "inst/extdata/input_data"
test = structure(function(path,letter) {
file = paste0(path, "/file_",letter,".tsv.gz")
data = read.csv(file,sep="\t",header=F,quote="\"",stringsAsFactors=F)
return(mean(data$var1))
}, ex = function(){
path = "inst/extdata/input_data"
m1 = test(path,"A")
})
I am building a package with the function in the folder R/
of the package directory.
When I set the working directory to the package parent and run the example line by line, everything goes fine. However when I check the package with R CMD check
it gives me the following:
cannot open file 'inst/extdata/input_data/file_A.tsv.gz': No such file or directory
Error in file(file, "rt") : cannot open the connection
I thought in checking and building the package the working directory is automatically set to the parent directory of the package (that in my case is "C:/Users/yuhu/R/Projects/ABCDpackage"
but it seems not to be the case.
What is the best practice in this case? I would avoid converting all data in .rda
format and put it in the data
folder as there are too many files. Is there a way to compile the package and set in the function example the relative working directory where the package is located? This would be helpful also when the package is distributed (therefore it should not be my own path)
Many thanks for your help.
I think you might just want to go with read.table... At any rate give this a try.
Refinement:
At the end of the day I think your problem is mainly because you are using read.csv instead of read.table which can open up .gz zipped files directly. So just to be sure. Here is a little experiment I did.
Experiment:
At the end of the day (I say that too much) you can be certain that the problem is in either the method you used to read the zipped up files or the text string you've constructed for the file names (haven't looked into the latter). At any rate best of luck with the package. I hope it turns tides.
When R CMD check (or the user later for that matter) runs the example, you need to provide the full path to the file! You can build that path easily with the
system.file
or thepath.package
command. If your package is called foo, the following should do the trick:You might want to add a file.path command somewhere to be OS independent.
Since read.csv is just a wrapper for read.table I would not expect any fundamental difference w.r.t. to reading compressed files.
Comment: R removes the "inst/" part of the directory when it builds the system directory. This thread has a discussion on the inst directory