python read lines of website source code 100 lines

2019-09-19 06:00发布

I'm trying to read the source code from a website 100 lines at a time

For example:

self.code = urllib.request.urlopen(uri)

#Get 100 first lines
self.lines = self.getLines()

...

#Get 100 next lines
self.lines = self.getLines()

My getLines code is like this:

def getLines(self):
    res = []
    i = 0

    while i < 100:
        res.append(str(self.code.readline()))
        i+=1

return res

But the problem is that getLines() always returns the first 100 lines of the code.

I've seen some solutions with next() or tell() and seek(), but it seems that those functions are not implemented in HTTPResponse class.

2条回答
\"骚年 ilove
2楼-- · 2019-09-19 06:41

This worked for me.

#!/usr/bin/env python

import urllib

def getLines(code):
    res = []
    i = 0

    while i < 100:
        res.append(str(code.readline()))
        i+=1

    return res

uri='http://www.google.com/'
code = urllib.urlopen(uri)

#Get 100 first lines
lines = getLines(code)

print lines

#Get 100 next lines
lines = getLines(code)

print lines
查看更多
叛逆
3楼-- · 2019-09-19 06:55

according to the documentation urllib.request.urlopen(uri) returns a file like object, so you should be able to do:

from itertools import islice

def getLines(self)
    res = []
    for line in islice(self.code,100): 
        res.append(line)
    return res

there's more information on islice in the itertools documentation. Using iterators will avoid the while loop and manual increments.

If you absolutely must use readline(), it's advisable to use a for loop, i.e.

for i in xrange(100): 
    ... 
查看更多
登录 后发表回答