i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command:
from lxml.html import parse
p= parse(‘http://www.google.com’).getroot()
but i am getting the following error:
Traceback (most recent call last):
File “”, line 1, in p=parse(‘http://www.google.com’).getroot()
File “C:\Python26\lib\site-packages\lxml-2.2.2-py2.6-win32.egg\lxml\html_init_.py”, line 661, in parse return etree.parse(filenameorurl, parser, baseurl=baseurl, **kw)
File “lxml.etree.pyx”, line 2698, in lxml.etree.parse (src/lxml/lxml.etree.c:49590)
File “parser.pxi”, line 1491, in lxml.etree.parseDocument (src/lxml/lxml.etree.c:71205) File “parser.pxi”, line 1520, in lxml.etree.parseDocumentFromURL (src/lxml/lxml.etree.c:71488)
File “parser.pxi”, line 1420, in lxml.etree.parseDocFromFile (src/lxml/lxml.etree.c:70583)
File “parser.pxi”, line 975, in lxml.etree.BaseParser.parseDocFrom
File (src/lxml/lxml.etree.c:67736)
File “parser.pxi”, line 539, in lxml.etree.ParserContext.handleParseResultDoc (src/lxml/lxml.etree.c:63820)
File “parser.pxi”, line 625, in lxml.etree.handleParseResult (src/lxml/lxml.etree.c:64741)
File “parser.pxi”, line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64056)
IOError: Error reading file ‘http://www.google.com’: failed to load external entity “http://www.google.com”
i am clueless as to what to do next as i am a newbie to python. please guide me to solve this error. thanks in advance!! :)
lxml.html.parse
does not fetch URLs.Here's how to do it with urllib2:
Update
Steven is right.
lxml.etree.parse
should accept and load URLs. I missed that. I've tried deleting this answer, but I'm not allowed.I retract my statement about it not fetching URLs.
Since line breaks are not allowed in comments, here's my implementation of MattH's answer:
According to the api docs it should work: http://lxml.de/api/lxml.html-module.html#parse
This seems to be a bug in lxml 2.2.2. I just tested on windows with python 2.6 and 2.7 and it does work with 2.3.0.
So: upgrade your lxml and you'll be fine.
I don't know exactly in which versions of lxml the problem occurs, but I believe the problem was not so much with lxml itself, but with the version of libxml2 used to build the windows binary. (certain versions of libxml2 had a problem with http on windows)