可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to get the Product title from an Amazon product page in Google Sheets.

I have searched and tried different approaches and came up with this

=ImportXML("https://www.amazon.com/dp/B01MSR8J29","//*[@id='productTitle']")

I want to get the title: Army Flag Shirt: Become Brothers Army TShirt

But I get Error, Imported content is empty.

Demo Sheet

回答1:

You can do this with just the //title xpath and a simple regexreplace to remove the standard Amazon.com: portion:

=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: ","")

If you also want to remove the rest of the breadcrumb e.g. : Clothing just modify it to use the OR symbol which in regex is | like this:

=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: |: Clothing","")

回答2:

Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So, importXML fails on them.

You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):

=producttitle("https://www. amazon.com/dp/B01MSR8J29")

returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:

function productTitle(url) {   
  var content = UrlFetchApp.fetch(url).getContentText();
  var match = content.match(/<span id="productTitle".*>([^<]*)<\/span>/);
  return match && match [1] ? match[1] : 'Title not found';
}

Here, the first line gets the source of the page; then a regex extracts the item title.

You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.

Use Google Sheets ImportXML with XPath to import A

问题:

回答1:

回答2:

收藏的人(0)

Use Google Sheets ImportXML with XPath to import A

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮