python read lines of website source code 100 lines

2019-09-19 06:00发布

I'm trying to read the source code from a website 100 lines at a time

For example:

self.code = urllib.request.urlopen(uri)

#Get 100 first lines
self.lines = self.getLines()

...

#Get 100 next lines
self.lines = self.getLines()

My getLines code is like this:

def getLines(self):
    res = []
    i = 0

    while i < 100:
        res.append(str(self.code.readline()))
        i+=1

return res

But the problem is that getLines() always returns the first 100 lines of the code.

I've seen some solutions with next() or tell() and seek(), but it seems that those functions are not implemented in HTTPResponse class.

标签： python url urllib readline

2条回答

\"骚年 ilove

2楼-- · 2019-09-19 06:41

This worked for me.

#!/usr/bin/env python

import urllib

def getLines(code):
    res = []
    i = 0

    while i < 100:
        res.append(str(code.readline()))
        i+=1

    return res

uri='http://www.google.com/'
code = urllib.urlopen(uri)

#Get 100 first lines
lines = getLines(code)

print lines

#Get 100 next lines
lines = getLines(code)

print lines

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-09-19 06:55

according to the documentation urllib.request.urlopen(uri) returns a file like object, so you should be able to do:

from itertools import islice

def getLines(self)
    res = []
    for line in islice(self.code,100): 
        res.append(line)
    return res

there's more information on islice in the itertools documentation. Using iterators will avoid the while loop and manual increments.

If you absolutely must use readline(), it's advisable to use a for loop, i.e.

for i in xrange(100): 
    ...

0人赞添加讨论(0) 举报

python read lines of website source code 100 lines

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间