I am very novice to python. I am facing issue with "wget" as well as " urllib.urlretrieve(str(myurl),tail)"
when I run script it's downloading files but filename are ending with "?"
my complete code :
import os
import wget
import urllib
import subprocess
with open('/var/log/na/na.access.log') as infile, open('/tmp/reddy_log.txt', 'w') as outfile:
results = set()
for line in infile:
if ' 200 ' in line:
tokens = line.split()
results.add(tokens[6]) # 7th token
for result in sorted(results):
print >>outfile, result
with open ('/tmp/reddy_log.txt') as infile:
results = set()
for line in infile:
head, tail = os.path.split(line)
print tail
myurl = "http://data.xyz.com" + str(line)
print myurl
wget.download(str(myurl))
# urllib.urlretrieve(str(myurl),tail)
output :
# python last.py
0011400026_recap.xml
http://data.na.com/feeds/mobile/android/v2.0/video/games/high/0011400026_recap.xml
latest_1.xml
http://data.na.com/feeds/mobile/iphone/article/league/news/latest_1.xml
currenttime.js
Listing the files :
# ls
0011400026_recap.xml? currenttime.js? latest_1.xml? today.xml?
A possible explanation of the behaviour you experience is that you do not sanitize your input
line
When you iterate on a file object, (
for line in infile:
) the string you get is terminated by a newline ('\n'
) character — if you do not remove the newline before usingline
, oh well, the newline character is still there in what is produced by your use ofline
…As an illustration of this concept, have a look at the transcript of a test I've done
As you can see, I read lines from a file and create some files using
line
as the filename and guess what, the filenames as listed byls
have a?
at the end — but we can do better, as it's explained in the fine manual page ofls
and, as you can see in the output of
ls -b
, the filenames are not terminated by a question mark (it's just a placeholder used by default by thels
program) but are terminated by a newline character.While I'm at it, I have to say that you should avoid to use a temporary file to store the intermediate results of your computation.
A nice feature of Python is the presence of generator expressions, if you want you can write your code as follows
Don't be fooled by the amount of comments, w/o comments my code is just