I am trying to get latest review form google play store. I am following this question for getting the latest reviews here
Method specified in the above link's answer works fine with scrapy shell but when I try this in my crawler it gets completely ignored.
Code snippet:
import re
import sys
import time
import urllib
import urlparse
from scrapy import Spider
from scrapy.spider import BaseSpider
from scrapy.http import Request, FormRequest
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor
from play.items import PlayApp
class PlaySpider(CrawlSpider):
name = "play"
allowed_domains = ["play.google.com"]
start_urls = [
"https://play.google.com/store/apps"
]
rules = (
Rule(LxmlLinkExtractor(allow=('/store/apps$', )), callback='parseCategory',follow=True),
)
def parseCategory(self, response):
"""
gets categories from store home page call parseLinks for each category
"""
#something here......
yield Request(categoryapps, callback=self.parseLinks)
def parseLinks(self, response):
'''
get all the links from the category page and then
pasess individual links to parseApp function.
'''
#something here
yield Request(link, callback=self.parseApp)
def parseApp(self, response):
'''
parses apps page to get info about the app
'''
#application page parsing ......
frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse_data, formdata=frmdata)
yield app
def parse_data(self, response):
# do stuff with data...
print '\n\n---------------I am here------------------\n\n'
This function parse_data is never called. Asked this on #scrapy IRC and few other places but no help. Please help me with this.
this is DEBUG response on terminal:
DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=isoft.studios.ncert.ncertbooks)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=af.hindi.stories.booktwo)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.frozenex.latestnewsms)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.aqua.apps.english.hindi.dictionary)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.merriamwebster)
2015-06-03 13:56:08+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=an.HindiTranslate)
So a POST request is indeed getting sent but callback method is not called.