I am new to python. I can't figure out what I am doing wrong when trying to read the contents of .tar.gz file into python. The tarfile I would like to read is hosted at the following web address:
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz
more info on file at this site (just so you can trust contents) http://www.pubmedcentral.nih.gov/utils/oa/oa.fcgi?id=PMC13901
The tarfile contains .pdf and .nxml copies of the journal article. And also a couple of image files.
If I open the file in my browser by copying and pasting. I can save to a location on my PC and import the tarfile fine using the following commands (note: winzip changes the file from .tar.gz to simply .tar when I save to location):
import tarfile
thetarfile = "C:/Users/dfcm/Documents/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar"
tfile = tarfile.open(thetarfile)
tfile
However, if I try to access the file directly using similar commands:
thetarfile = "ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar.gz"
bbb = tarfile.open(thetarfile)
That results in the following error:
Traceback (most recent call last):
File "<pyshell#137>", line 1, in <module>
bbb = tarfile.open(thetarfile)
File "C:\Python30\lib\tarfile.py", line 1625, in open
return func(name, "r", fileobj, **kwargs)
File "C:\Python30\lib\tarfile.py", line 1687, in gzopen
fileobj = bltn_open(name, mode + "b")
File "C:\Python30\lib\io.py", line 278, in __new__
return open(*args, **kwargs)
File "C:\Python30\lib\io.py", line 222, in open
closefd)
File "C:\Python30\lib\io.py", line 615, in __init__
_fileio._FileIO.__init__(self, name, mode, closefd)
IOError: [Errno 22] Invalid argument: 'ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/b0/ac/Breast_Cancer_Res_2001_Nov_9_3(1)_61-65.tar'
Can anyone explain what I am doing wrong when trying to read the .tar.gz file directly from the web address? Thanks in advance. Chris