Python URL variable int add to string [closed]

2019-09-22 03:17发布

问题:

pgno = 1
while pgno < 4304:
    result = urllib.urlopen("http://www.example.comtraderesourcespincode.aspx?" +
                            "&GridInfo=Pincode0"+ pgno)
    print pgno
    html = result.read()
    parser = etree.HTMLParser()
    tree   = etree.parse(StringIO.StringIO(html), parser)
    pgno += 1

in http://.......=Pincode0 I need to add 1..for e.g like 'Pincode01', loop it 01 to 02, 03 .. for which I am using a while loop and the variable assigned is 'pgno'.

The problem is the counter is adding 1, but 'Pincode01' is not becoming 'Pincode02' ... therefore it is not opening the 2nd page of the site.

I even tried +str(pgno)) ... no luck.

Please show how to do it. I am not able to do this ...and have attempted it several times.

回答1:

If your problem is with the format of the number use this instead of adding a str to an int:

>>> pgno = 1
>>> while pgno < 20:
...     print '%02d' % pgno
...     pgno += 1
... 
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19

See the string format docs for more options

Also, in a more pythonic way using string format

>>> for pgno in range(9, 12):
...    print '{0:02d}'.format(pgno)
... 
09
10
11


回答2:

Probably, you want this :

from urllib import urlopen
import re 

pgno = 2
url = "http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode0%s" %str(pgno)
print url +'\n'
sock = urlopen(url)
htmlcode = sock.read()
sock.close()

x = re.search('%;"><a href="javascript:__doPostBack',htmlcode).start()

pat = ('\t\t\t\t<td style="width:\d+%;">(\d+)</td>'
       '<td style="width:\d+%;">(.+?)</td>'
       '<td style="width:\d+%;">(.+?)</td>'
       '<td style="width:30%;">(.+?)</td>\r\n')
regx = re.compile(pat)

print '\n'.join(map(repr,regx.findall(htmlcode,x)))

result

http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode02

('110001', 'New Delhi', 'Delhi', 'Baroda House')
('110001', 'New Delhi', 'Delhi', 'Bengali Market')
('110001', 'New Delhi', 'Delhi', 'Bhagat Singh Market')
('110001', 'New Delhi', 'Delhi', 'Connaught Place')
('110001', 'New Delhi', 'Delhi', 'Constitution House')
('110001', 'New Delhi', 'Delhi', 'Election Commission')
('110001', 'New Delhi', 'Delhi', 'Janpath')
('110001', 'New Delhi', 'Delhi', 'Krishi Bhawan')
('110001', 'New Delhi', 'Delhi', 'Lady Harding Medical College')
('110001', 'New Delhi', 'Delhi', 'New Delhi Gpo')
('110001', 'New Delhi', 'Delhi', 'New Delhi Ho')
('110001', 'New Delhi', 'Delhi', 'North Avenue')
('110001', 'New Delhi', 'Delhi', 'Parliament House')
('110001', 'New Delhi', 'Delhi', 'Patiala House')
('110001', 'New Delhi', 'Delhi', 'Pragati Maidan')
('110001', 'New Delhi', 'Delhi', 'Rail Bhawan')
('110001', 'New Delhi', 'Delhi', 'Sansad Marg Hpo')
('110001', 'New Delhi', 'Delhi', 'Sansadiya Soudh')
('110001', 'New Delhi', 'Delhi', 'Secretariat North')
('110001', 'New Delhi', 'Delhi', 'Shastri Bhawan')
('110001', 'New Delhi', 'Delhi', 'Supreme Court')
('110002', 'New Delhi', 'Delhi', 'Rajghat Power House')
('110002', 'New Delhi', 'Delhi', 'Minto Road')
('110002', 'New Delhi', 'Delhi', 'Indraprastha Hpo')
('110002', 'New Delhi', 'Delhi', 'Darya Ganj')

I wrote this code after having studied the structure of the HTML source code with the following code (I think you'll understand it without any more explanations):

from urllib2 import Request,urlopen
import re 

pgno = 2
url = "http://www.eximguru.com/traderesources/pincode.aspx?&amp;GridInfo=Pincode0%s" %str(pgno)
print url +'\n'
sock = urlopen(url)
htmlcode = sock.read()
sock.close()

li = htmlcode.splitlines(True)

print '\n'.join(str(i) + ' ' + repr(line)+'\n' for i,line in enumerate(li) if 275<i<300)


ch = ''.join(li[0:291])
from collections import defaultdict
didi =defaultdict(int)
for c in ch:
    didi[c] += 1

print '\n\n'+repr(li[289])
print '\n'.join('%r -> %s' % (c,didi[c]) for c in li[289] if didi[c]<35)

.

Now, the problem is that the same HTML is returned for all the values of pgno. The site may detect it is a program that wants to connect and fetch data. This problem must be treated with the tools in urllib2, but I'm not trained to that.



回答3:

The loop:

pgno = 1
while pgno < 4304:
    print pgno
    pgno += 1

Works correctly and the number is increasing.

You are either describing the problems in incorrect way or there is problem in your basic assumptions of the problem. Can you please try to describe what you are trying to do in the first place?