Python + Scrapy + JSON + XPath : How to scrape JSO

2019-07-29 23:40发布

I know how to fetch the XPATHs for HTML datapoints with Scrapy. But I have to scrape all the URLs(starting URLs), of this page on this site, which are written in JSON format:

https://highape.com/bangalore/all-events

view-source:https://highape.com/bangalore/all-events

I usually write this in this format:

def parse(self, response):
      events = response.xpath('**What To Write Here?**').extract()

      for event in events:
          absolute_url = response.urljoin(event)
          yield Request(absolute_url, callback = self.parse_event)

Please tell me what I should write in 'What To Write Here?' portion.

标签： python json xpath scrapy

2条回答

We Are One

2楼-- · 2019-07-30 00:31

What to write here?

events = response.xpath("//script[@type='application/ld+json']").extract()
events = json.loads(events[0])

0人赞添加讨论(0) 举报

虎瘦雄心在

3楼-- · 2019-07-30 00:40

View page source of the url then copy line 76 - 9045 and save as data.json in your local drive then use this code...

import json
from bs4 import BeautifulSoup
import requests
req = requests.get('https://highape.com/bangalore/all-events')
soup = BeautifulSoup(req.content, 'html.parser')
js = soup.find_all('script')[5].text
data = json.loads(js, strict=False)
for i in data:
    url = i['url']
    print(url)
    ##callback with scrapy

0人赞添加讨论(0) 举报

Python + Scrapy + JSON + XPath : How to scrape JSO

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间