I'm having a few problems with parsing simple HTML with use of the ElementTree module out of the standard Python libraries. This is my source code:
from urllib.request import urlopen
from xml.etree.ElementTree import ElementTree
import sys
def main():
site = urlopen("http://1gabba.in/genre/hardstyle")
try:
html = site.read().decode('utf-8')
xml = ElementTree(html)
print(xml)
print(xml.findall("a"))
except:
print(sys.exc_info())
if __name__ == '__main__':
main()
Either this fails, I get the following output on my console:
<xml.etree.ElementTree.ElementTree object at 0x00000000027D14E0>
(<class 'AttributeError'>, AttributeError("'str' object has no attribute 'findall'",), <traceback object at 0x0000000002910B88>)
So xml is indeed an ElementTree object, when we look at the documentation we'll see that the ElementTree class has a findall function. Extra thingie: xml.find("a") works fine, but it returns an int instead of an Element instance.
So could anybody help me out? What I am misunderstanding?