How can I extract a javascript value in scrapy

2019-07-22 23:56发布

I am using scrapy to crawl youtube videos and I need the language of title/description of the video.When I use browser view source on this video I can inside a script tag there is a variable 'METADATA_LANGUAGE': 'no'. Can I extract this value in scrapy and its extensions or I should download and parse html with libraries like beautifulsoup / htmlparser.

标签: python scrapy
2条回答
ら.Afraid
2楼-- · 2019-07-23 00:27

Yes this is possible using Scrapy. You could take a look at this question.

There are many ways to achieve what you're looking for. One is to get the <script> tag using scrapy's selectors and then use regex to get the specific METADATA_LANGUAGE variable you're looking for.

查看更多
看我几分像从前
3楼-- · 2019-07-23 00:35

Based on this you can select the text of script with xpath/css and then use regex to search the variable name. Assum the first script contains the METADATA_LANGUAGE:

items = response.xpath('//script/text()')[0].re(".*METADATA_LANGUAGE.*")
查看更多
登录 后发表回答