I want to get the Product title from an Amazon product page in Google Sheets.
I have searched and tried different approaches and came up with this
=ImportXML("https://www.amazon.com/dp/B01MSR8J29","//*[@id='productTitle']")
I want to get the title: Army Flag Shirt: Become Brothers Army TShirt
But I get Error, Imported content is empty.
You can do this with just the
//title
xpath and a simple regexreplace to remove the standardAmazon.com:
portion:If you also want to remove the rest of the breadcrumb e.g.
: Clothing
just modify it to use theOR
symbol which in regex is|
like this:Much of HTML isn't valid XML, in particular Amazon's pages are not valid XML. So,
importXML
fails on them.You can use an Apps Script via a custom function as follows (remove the space before "amazon", it's here to prevent SO from rewriting the URL):
returns "Army Flag Shirt: Become Brothers Army TShirt", provided that the custom function is entered in Script Editor as follows:
Here, the first line gets the source of the page; then a regex extracts the item title.
You will find a similar post here, including the question of whether this activity is compliant with Amazon's Terms of Services.