Let's say I have some plain text in HTML-like format like this:
<div id="foo"><p id="bar">Some random text</p></div>
And I need to be able to run XPath on it to retrieve some inner element. How can I convert plain text to some kind of object which I could use XPath on?
Andersson already posted a solution to my question. This is a second one which I just discovered that works as well and that uses Scrapy's classes, making it possible to use all methods already familiar to a Scrapy user (e.g., extract(), extract_first(), etc).
You can just use a normal selector on which to run the same
xpath
,css
queries directly:You can pass HTML code sample as string to lxml.html and parse it with XPath: