How can I extract a javascript value in scrapy

2019-07-22 23:56发布

I am using scrapy to crawl youtube videos and I need the language of title/description of the video.When I use browser view source on this video I can inside a script tag there is a variable 'METADATA_LANGUAGE': 'no'. Can I extract this value in scrapy and its extensions or I should download and parse html with libraries like beautifulsoup / htmlparser.

标签： python scrapy

2条回答

ら.Afraid

2楼-- · 2019-07-23 00:27

Yes this is possible using Scrapy. You could take a look at this question.

There are many ways to achieve what you're looking for. One is to get the <script> tag using scrapy's selectors and then use regex to get the specific METADATA_LANGUAGE variable you're looking for.

0人赞添加讨论(0) 举报

看我几分像从前

3楼-- · 2019-07-23 00:35

Based on this you can select the text of script with xpath/css and then use regex to search the variable name. Assum the first script contains the METADATA_LANGUAGE:

items = response.xpath('//script/text()')[0].re(".*METADATA_LANGUAGE.*")

0人赞添加讨论(0) 举报

How can I extract a javascript value in scrapy

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间