使用XPath将不起作用刮网页内容(Scraping web content using xpath

2019-10-20 14:39发布

我使用XPath来刮亚马逊网页特别，但它不工作。任何一个可以给我一些建议吗？这里的链接到该页面：链接

我想刮这些：“有趣，信用卡大小的版画”我使用的代码是在这里：

from lxml import html
import requests

url = 'http://www.amazon.co.uk/dp/B009CX5VN2'
page = requests.get(url)
tree = html.fromstring(page.text)
feature_bullets = tree.xpath('//*[@id="feature-bullets"]/ul/li[1]/span/text()')

但feature_bullets总是空的。真的需要一些帮助。

Answer 1:

我下载的HTML不符合您的期望。下面是对我的作品表达：

tree.xpath('//div[@id="technicalProductFeaturesATF"]/ul/li[1]/text()')

完整的程序：

from lxml import html
import requests
from pprint import pprint

url = 'http://www.amazon.co.uk/dp/B009CX5VN2'
page = requests.get(url)
tree = html.fromstring(page.text)
feature_bullets = tree.xpath('//div[@id="technicalProductFeaturesATF"]/ul/li/text()')

pprint(feature_bullets)

结果：

$ python foo.py 
['Fun, credit card-sized prints',
 'LCD film counter and shooting mode display',
 'Camera mounted mirror for self portraits',
 'Powered by CR2 Batteries, Built-in, Automatic electronic flash',
 'Fujifilm Instax Mini 25 + 30 Instax Mini Film']

文章来源: Scraping web content using xpath won't work