I hope everyone is doing well.
I have this code(part of it) for a spider, now this is the last part of the scraping, here it start to scrape and then write in the csv file, so I got this doubdt, it is possible to join or add xpath with the result printed in the file, for example:
<h5>Soundbooster</h5> <br><br>
<p class="details">
<b>Filtro attuale</b>
</p>
<blockquote>
<p>
<b>Catalogo:</b>
Aliant</br>
<b>Marca e Modello:</b>
Mazda - 3 </br>
<b>Versione:</b>
(3th gen) 2013-now (Petrol)
</p>
</blockquote>
I want to join the following for one field in the csv file, should be something like this:
Soundbooster per Mazda - 3 - (3th gen) 2013-now (Petrol)
And here it is where I am lost, It is possible? I don't know if I have to use add.xpath or join or another method and how to use it right.
This is part of my code:
def parse_content_details(self, response):
exists = os.path.isfile("ntp/ntp_aliant.csv")
with open("ntp/ntp_aliant.csv", "a+", newline='') as csvfile:
fieldnames = ['*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)','*Category','*Title','Model','ConditionID','PostalCode',\
'VATPercent','*C:Marca','Product:EAN','*C:MPN','PicURL', 'Description','*Format','*Duration','StartPrice','*Quantity','PayPalAccepted','PayPalEmailAddress',\
'PaymentInstructions','*Location','ShippingService-1:FreeShipping', 'ShippingService-1:Option','ShippingService-1:Cost', 'ShippingService-1:Priority',\
'ShippingService-2:Option','ShippingService-2:Cost','ShippingService-2:Priority','ShippingService-3:Option','ShippingService-3:Cost',\
'ShippingService-3:Priority','ShippingService-4:Option','ShippingService-4:Cost','ShippingService-4:Priority','*DispatchTimeMax',\
'*ReturnsAcceptedOption','ReturnsWithinOption','RefundOption','ShippingCostPaidByOption']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if not exists:
writer.writeheader()
for ntp in response.css('div.content-1col-nobox'):
name = ntp.xpath('normalize-space(//h5/text())').extract_first()
brand = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[4])').extract_first()
version = ntp.xpath('normalize-space(//div/blockquote[1]/p/text()[6])').extract_first()
result = response.xpath(name + " per " + brand + " - " + version)
MPN = ntp.xpath('normalize-space(//tr[2]/td[1]/text())').extract_first()
description = ntp.xpath('normalize-space(//div[6]/div[1]/div[2]/div/blockquote[2]/p/text())').extract_first()
price = ntp.xpath('normalize-space(//tr[2]/td[@id="right_cell"][1])').extract()[0].split(None,1)[0].replace(",",".")
picUrl = response.urljoin(ntp.xpath('//div/p[3]/img/@src').extract_first())
writer.writerow({
'*Action(SiteID=Italy|Country=IT|Currency=EUR|Version=745|CC=UTF-8)':'Add',\
'*Category':'30895',\
'*Title': name,\
'Model': result,\
'ConditionID': '1000',\
'PostalCode':'154',\
'VATPercent':'22',\
'*C:Marca':'Priority Parts',\
'Product:EAN':'',\
'*C:MPN': MPN,\
'PicURL': picUrl,\
'Description': description,\
'*Format' : 'FixedPrice',\
'*Duration': 'GTC',\
'StartPrice' : price,\
'*Quantity':'3',\
'PayPalAccepted': '1',\
'PayPalEmailAddress' : 'your@gmail.com',\
'PaymentInstructions' : 'your@gmail.com',\
'*Location' : 'Italia',\
'ShippingService-1:FreeShipping' : '1',\
'ShippingService-1:Option' : 'IT_OtherCourier3To5Days',\
'ShippingService-1:Cost' : '10',\
'ShippingService-1:Priority' : '1',\
'ShippingService-2:Option' : 'IT_QuickPackage3',\
'ShippingService-2:Cost' : '15',\
'ShippingService-2:Priority' : '2',\
'ShippingService-3:Option': 'IT_QuickPackage1',\
'ShippingService-3:Cost' : '12',\
'ShippingService-3:Priority' : '3',\
'ShippingService-4:Option': 'IT_Pickup',\
'ShippingService-4:Cost' : '0',\
'ShippingService-4:Priority' : '4',\
'*DispatchTimeMax' : '5',\
'*ReturnsAcceptedOption' : 'ReturnsAccepted',\
'ReturnsWithinOption' : 'Days_14',\
'RefundOption' : 'MoneyBackOrExchange',\
'ShippingCostPaidByOption' : 'Buyer'})
Any help will be appreciate it. Cheers. Valter.
At the end @Casper was right, in the comments we see the right answer
This is the final result: