excluding elements to be scraped

2019-07-17 21:47发布

I am trying to exclude certain elements from a list.

on the page http://www.persimmonhomes.com/rooley-park-10126 there are the elements I want to scrap which are (div class="housetype js-filter-housetype") and there are those I don't want to scrap which are (div class="housetype js-filter-housetype" style="display: none;")

the html looks something like:

<div class="housetype js-filter-housetype"> 
<div class="housetype js-filter-housetype"> 
<div class="housetype js-filter-housetype"> 
<div class="housetype js-filter-housetype">
<div class="housetype js-filter-housetype"> 
<div class="housetype js-filter-housetype" style="display: none;">
<div class="housetype js-filter-housetype" style="display: none;">

I am trying to write code to exclude the div class="housetype js-filter-housetype" style="display: none;".

My current code to do this is:

start_urls = [
    "http://www.persimmonhomes.com/rooley-park-10126",
]

def parse(self, response):
    for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
        item = PersimmonItem()
        item['housetypeheading'] = sel.xpath('//*[@class="houses-list js-scrollable js-filterable js-houselist"]//*[not(@style="display: none;")]/h2[@class="housetype__heading"]').extract()
        yield item

so far, this does not work. It just scraps all the elements whether or not it has the part (style="display: none;"). I have also tried the [not(contains(@style, "display: none;"))] - but so far no luck.

may i ask for any ideas?

1条回答
该账号已被封号
2楼-- · 2019-07-17 22:09

If you want to ignore all with a style attribute:

"//div[@class='housetype js-filter-housetype' and not(@style)]"

Or that particular style, just use and:

"//div[@class='housetype js-filter-housetype' and not(contains(@style,'display: none;'))]"
查看更多
登录 后发表回答