Python HTML parsing from url [closed]

2019-09-22 06:35发布

问题:

I've heard it's possible to get data from a link. But I want to know the best method, I've read about that, but I still want to know how and what's the best module to do so. I want to parse this:

<div class="blalbal"><h2>DATA5</h2>
<div class="blabla">
<table class="tabledata">
<tr><th>Blablabla:</th><td>DATA3<br>(DATA4)</td></tr>
<tr><th>Blablabla:</th><td>DATA2</td></tr>
<tr><th>Blablabla:</th><td>DATA1</td></tr>
</td>

as a string, like DATA1, DATA2, DATA3 (DATA4), DATA5

So, I'd want to see how is this possible (just an example) and what's the best & fastest method. Thanks!

回答1:

From Python HTMLParser Documentation:

from HTMLParser import HTMLParser

# create a subclass and override the handler methods
class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print "Encountered a start tag:", tag
    def handle_endtag(self, tag):
        print "Encountered an end tag :", tag
    def handle_data(self, data):
        print "Encountered some data  :", data

# instantiate the parser and fed it some HTML
parser = MyHTMLParser()
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')

In your case you can just use the handle_data function to print HTML contents.