I am using scrapy to crawl youtube videos and I need the language of title/description of the video.When I use browser view source on this video I can inside a script tag there is a variable 'METADATA_LANGUAGE': 'no'
. Can I extract this value in scrapy and its extensions or I should download and parse html with libraries like beautifulsoup / htmlparser.
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
Yes this is possible using Scrapy. You could take a look at this question.
There are many ways to achieve what you're looking for. One is to get the
<script>
tag using scrapy's selectors and then use regex to get the specificMETADATA_LANGUAGE
variable you're looking for.Based on this you can select the text of script with xpath/css and then use regex to search the variable name. Assum the first script contains the
METADATA_LANGUAGE
: