Use Google Sheets ImportXML with XPath to import A

I want to get the Product title from an Amazon product page in Google Sheets.

I have searched and tried different approaches and came up with this

=ImportXML("https://www.amazon.com/dp/B01MSR8J29","//*[@id='productTitle']")

I want to get the title: Army Flag Shirt: Become Brothers Army TShirt

But I get Error, Imported content is empty.

Demo Sheet

标签： xpath import google-sheets

2条回答

在下西门庆

2楼-- · 2019-09-20 18:15

You can do this with just the //title xpath and a simple regexreplace to remove the standard Amazon.com: portion:

=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: ","")

If you also want to remove the rest of the breadcrumb e.g. : Clothing just modify it to use the OR symbol which in regex is | like this:

=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: |: Clothing","")

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-09-20 18:26

Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So, importXML fails on them.

You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):

=producttitle("https://www. amazon.com/dp/B01MSR8J29")

returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:

function productTitle(url) {   
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = content.match(/<span id="productTitle".*>([^<]*)<\/span>/);
  return match && match [1] ? match[1] : 'Title not found';
}

Here, the first line gets the source of the page; then a regex extracts the item title.

You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.

0人赞添加讨论(0) 举报

Use Google Sheets ImportXML with XPath to import A

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间