Scrapy add.xpath or join xpath

2019-08-19 07:18发布

I hope everyone is doing well.

I have this code(part of it) for a spider, now this is the last part of the scraping, here it start to scrape and then write in the csv file, so I got this doubdt, it is possible to join or add xpath with the result printed in the file, for example:

        <h5>Soundbooster</h5> <br><br>
          <p class="details">
            <b>Filtro attuale</b>
          </p>
          <blockquote>
            <p>
              <b>Catalogo:</b> 
                Aliant</br>
              <b>Marca e Modello:</b> 
                Mazda - 3 </br>
              <b>Versione:</b> 
                (3th gen) 2013-now (Petrol)
            </p>
          </blockquote>

I want to join the following for one field in the csv file, should be something like this:

Soundbooster per Mazda - 3 - (3th gen) 2013-now (Petrol)

And here it is where I am lost, It is possible? I don't know if I have to use add.xpath or join or another method and how to use it right.

This is part of my code:

def parse_content_details(self, response):

        exists = os.path.isfile("ntp/ntp_aliant.csv")
        with open("ntp/ntp_aliant.csv", "a+", newline='') as csvfile:
            fieldnames = ['*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)','*Category','*Title','Model','ConditionID','PostalCode',\
            'VATPercent','*C:Marca','Product:EAN','*C:MPN','PicURL', 'Description','*Format','*Duration','StartPrice','*Quantity','PayPalAccepted','PayPalEmailAddress',\
            'PaymentInstructions','*Location','ShippingService-1:FreeShipping', 'ShippingService-1:Option','ShippingService-1:Cost', 'ShippingService-1:Priority',\
             'ShippingService-2:Option','ShippingService-2:Cost','ShippingService-2:Priority','ShippingService-3:Option','ShippingService-3:Cost',\
             'ShippingService-3:Priority','ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','*DispatchTimeMax',\
             '*ReturnsAcceptedOption','ReturnsWithinOption','RefundOption','ShippingCostPaidByOption']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            if not exists:               
                writer.writeheader()

            for ntp in response.css('div.content-1col-nobox'):

                name = ntp.xpath('normalize-space(//h5/text())').extract_first()
                brand = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[4])').extract_first()
                version = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[6])').extract_first()
                result = response.xpath(name + " per " + brand + " - " + version)
                MPN = ntp.xpath('normalize-space(//tr[2]/td[1]/text())').extract_first()
                description = ntp.xpath('normalize-space(//div[6]/div[1]/div[2]/div/blockquote[2]/p/text())').extract_first()
                price = ntp.xpath('normalize-space(//tr[2]/td[@id="right_cell"][1])').extract()[0].split(None,1)[0].replace(",",".")
                picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())

                writer.writerow({
                '*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)':'Add',\
                '*Category':'30895',\
                '*Title': name,\
                'Model': result,\
                'ConditionID': '1000',\
                'PostalCode':'154',\
                'VATPercent':'22',\
                '*C:Marca':'Priority Parts',\
                'Product:EAN':'',\
                '*C:MPN': MPN,\
                'PicURL': picUrl,\
                'Description': description,\
                '*Format' : 'FixedPrice',\
                '*Duration': 'GTC',\
                'StartPrice' : price,\
                '*Quantity':'3',\
                'PayPalAccepted': '1',\
                'PayPalEmailAddress' : 'your@gmail.com',\
                'PaymentInstructions' : 'your@gmail.com',\
                '*Location' : 'Italia',\
                'ShippingService-1:FreeShipping' : '1',\
                'ShippingService-1:Option' : 'IT_OtherCourier3To5Days',\
                'ShippingService-1:Cost' : '10',\
                'ShippingService-1:Priority' : '1',\
                'ShippingService-2:Option' : 'IT_QuickPackage3',\
                'ShippingService-2:Cost' : '15',\
                'ShippingService-2:Priority' : '2',\
                'ShippingService-3:Option': 'IT_QuickPackage1',\
                'ShippingService-3:Cost' : '12',\
                'ShippingService-3:Priority' : '3',\
                'ShippingService-4:Option': 'IT_Pickup',\
                'ShippingService-4:Cost' : '0',\
                'ShippingService-4:Priority' : '4',\
                '*DispatchTimeMax' : '5',\
                '*ReturnsAcceptedOption' : 'ReturnsAccepted',\
                'ReturnsWithinOption' : 'Days_14',\
                'RefundOption' : 'MoneyBackOrExchange',\
                'ShippingCostPaidByOption' : 'Buyer'})

Any help will be appreciate it. Cheers. Valter.

1条回答
【Aperson】
2楼-- · 2019-08-19 07:55

At the end @Casper was right, in the comments we see the right answer

"{} per {} - {}".format(name, brand, version)

This is the final result:

            name = ntp.xpath('normalize-space(//h5/text())').extract_first()
            brand = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[4])').extract_first()
            version = ntp.xpath('normalize-space(//div/blockquote[1]/p//text()[6])').extract_first()
            result = ("{} per {} - {}".format(name, brand, version))

            writer.writerow({

            '*Title': result,\
查看更多
登录 后发表回答