I want to get the Product title from an Amazon product page in Google Sheets.
I have searched and tried different approaches and came up with this
=ImportXML("https://www.amazon.com/dp/B01MSR8J29","//*[@id='productTitle']")
I want to get the title:
Army Flag Shirt: Become Brothers Army TShirt
But I get Error, Imported content is empty.
Demo Sheet
You can do this with just the //title
xpath and a simple regexreplace to remove the standard Amazon.com:
portion:
=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: ","")
If you also want to remove the rest of the breadcrumb e.g. : Clothing
just modify it to use the OR
symbol which in regex is |
like this:
=REGEXREPLACE(ImportXML("https://www.amazon.com/dp/B01MSR8J29","//title"),"Amazon.com: |: Clothing","")
Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So, importXML
fails on them.
You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):
=producttitle("https://www. amazon.com/dp/B01MSR8J29")
returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:
function productTitle(url) {
var content = UrlFetchApp.fetch(url).getContentText();
var match = content.match(/<span id="productTitle".*>([^<]*)<\/span>/);
return match && match [1] ? match[1] : 'Title not found';
}
Here, the first line gets the source of the page; then a regex extracts the item title.
You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.