How can I use fread to read gz files in R?

2020-03-25 06:22发布

问题:

I am on a windows machine trying to speed up the read.table step. My files are all .gz.

x=paste("gzip -c ",filename,sep="")
phi_raw = fread(x)

Error in fread(x) : 

Cannot understand the error . Its a bit too cryptic for me.

Not a duplicate as suggested by zx8754: using specifically in the context of fread. And while fread dows not have native support for gzip, this paradigm should work. See http://www.molpopgen.org/coding/datatable.html

Update

Per suggestion below using system yields a longer error message - though i am still stuck.

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:


running command 'gzip -c D:/x_.gz' had status 1

Update

Running with gunzip as pointed out below:

Error in fread(system(x)) : 

  'input' must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself

In addition: Warning message:

running command 'gunzip -c D:/XX_.gz' had status 127

note the different status

回答1:

I often use gzip with fread on Windows. It reads in the files without decompressing them. I would try adding the -d option with the gzip command. Specifically, in your code, try x=paste("gzip -dc ",filename,sep=""). Here is a reproducible example that works on my machine:

df <- data.frame(x = 1:10, y = letters[1:10])
write.table(df, 'df.txt', row.names = F, quote = F, sep = '\t')
system("which gzip")
system("gzip df.txt")
data.table::fread("gzip -dc df.txt")

And here is my sessionInfo().

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] rsconnect_0.4.3  tools_3.3.1      data.table_1.9.6 chron_2.3-47 

I have successfully used gzip on Windows without adding a decompressed file to my hard drive using both Rtools (https://cran.r-project.org/bin/windows/Rtools/) and Gow (https://github.com/bmatzelle/gow/wiki). If my reproducible example above does not work for you, use the which gzip and which gunzip commands to see the exact .exe that is running. If it is not Rtools or Gow, perhaps try installing one of those two and trying the reproducible example again.