I am currently trying to read a txt file from a website.
My script so far is:
webFile = urllib.urlopen(currURL)
This way, I can work with the file. However, when I try to store the file (in webFile
), I only get a link to the socket. Another solution I tried was to use read()
webFile = urllib.urlopen(currURL).read()
However this seems to remove the formating (\n
, \t
etc) are removed.
If I open the file like this:
webFile = urllib.urlopen(currURL)
I can read it line by line:
for line in webFile:
print line
This will should result in:
"this"
"is"
"a"
"textfile"
But I get:
't'
'h'
'i'
...
I wish to get the file on my computer, but maintain the format at the same time.
You should use readlines() to read entire line:
response = urllib.urlopen(currURL)
lines = response.readlines()
for line in lines:
.
.
But, i strongly recommend you to use requests
library.
Link here http://docs.python-requests.org/en/latest/
This is because you iterate over a string. And that will result in character for character printing.
Why not save the whole file at once?
import urllib
webf = urllib.urlopen('http://stackoverflow.com/questions/32971752/python-read-file-from-web-site-url')
txt = webf.read()
f = open('destination.txt', 'w+')
f.write(txt)
f.close()
If you really want to loop over the file line for line use txt = webf.readlines()
and iterate over that.
If you're just trying to save a remote file to your local server as part of a python script, you could use the PycURL library to download and save it without parsing it. More info here - http://pycurl.sourceforge.net
Alternatively, if you want to read and then write the output, I think you've just got the methods out of sequence. Try the following:
# Assign the open file to a variable
webFile = urllib.urlopen(currURL)
# Read the file contents to a variable
file_contents = webFile.read()
print(file_contents)
> This will be the file contents
# Then write to a new local file
f = open('local file.txt', 'w')
f.write(file_contents)
If neither applies, please update the question to clarify.