How do I seek to a particular position on a remote (HTTP) file so I can download only that part?
Lets say the bytes on a remote file were: 1234567890
I wanna seek to 4 and download 3 bytes from there so I would have: 456
and also, how do I check if a remote file exists? I tried, os.path.isfile() but it returns False when I'm passing a remote file url.
If you are downloading the remote file through HTTP, you need to set the
Range
header.Check in this example how it can be done. Looks like this:
EDIT: I just found a better implementation. This class is very simple to use, as it can be seen in the docstring.
Update: The "better implementation" has moved to github: excid3/urlgrabber in the byterange.py file.
I highly recommend using the requests library. It is easily the best HTTP library I have ever used. In particular, to accomplish what you have described, you would do something like:
I did not find any existing implementations of a file-like interface with seek() to HTTP URLs, so I rolled my own simple version: https://github.com/valgur/pyhttpio. It depends on
urllib.request
but could probably easily be modified to userequests
, if necessary.The full code:
A small usage example:
Edit: There is actually a mostly identical, if slightly more minimal, implementation in this answer: https://stackoverflow.com/a/7852229/2997179
AFAIK, this is not possible using fseek() or similar. You need to use the HTTP Range header to achieve this. This header may or may not be supported by the server, so your mileage may vary.
EDIT: This is of course assuming that by remote file you mean a file stored on a HTTP server...
If the file you want is on an FTP server, FTP only allows to to specify a start offset and not a range. If this is what you want, then the following code should do it (not tested!)