Scrapy - how to convert string into an object whic

2019-07-19 12:26发布

Let's say I have some plain text in HTML-like format like this:

<div id="foo"><p id="bar">Some random text</p></div>

And I need to be able to run XPath on it to retrieve some inner element. How can I convert plain text to some kind of object which I could use XPath on?

标签： xpath scrapy

3条回答

乱世女痞

2楼-- · 2019-07-19 12:39

Andersson already posted a solution to my question. This is a second one which I just discovered that works as well and that uses Scrapy's classes, making it possible to use all methods already familiar to a Scrapy user (e.g., extract(), extract_first(), etc).

text = """<div id="foo"><p id="bar">Some random text</p></div>"""
#First, we need to encode the text
text_encoded = text.encode('utf-8')
#Now, convert it to a HtmlResponse object
text_in_html = HtmlResponse(url='some url', body=text_encoded, encoding='utf-8')
#Now we can use XPath normally as if the text was a common HTML response
text_in_html.xpath(//p/text()).extract_first()

0人赞添加讨论(0) 举报

Lonely孤独者°

3楼-- · 2019-07-19 12:45

You can just use a normal selector on which to run the same xpath, css queries directly:

from scrapy import Selector

...

sel = Selector(text="<div id="foo"><p id="bar">Some random text</p></div>")
selected_xpath = sel.xpath('//div[@id="foo"]')

0人赞添加讨论(0) 举报

疯言疯语

4楼-- · 2019-07-19 12:46

You can pass HTML code sample as string to lxml.html and parse it with XPath:

from lxml import html

code = """<div id="foo"><p id="bar">Some random text</p></div>"""
source = html.fromstring(code)
source.xpath('//div/p/text()')

0人赞添加讨论(0) 举报

Scrapy - how to convert string into an object whic

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间