Assuming I have an Amazon product URL like so
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=0AY9N5GXRYHCADJP5P0V&pf_rd_t=101&pf_rd_p=500528151&pf_rd_i=507846
How could I scrape just the ASIN using javascript? Thanks!
If the ASIN is always in that position in the URL:
though there's probably little chance of an ASIN getting %-escaped.
Since the ASIN is always a sequence of 10 letters and/or numbers immediately after a slash, try this:
The additional
(?:[/?]|$)
after the ASIN is to ensure that only a full path segment is taken.This may be a simplistic approach, but I have yet to find an error in it using any of the URL's provided in this thread that people say is an issue.
Simply, I take the URL, split it on the "/" to get the discrete parts. Then loop through the contents of the array and bounce them off of the regex. In my case the variable i represents an object that has a property called RawURL to contain the raw url that I am working with and a property called VendorSKU that I am populating.
So far, this has worked perfectly.
Amazon's detail pages can have several forms, so to be thorough you should check for them all. These are all equivalent:
http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C
http://www.amazon.com/dp/B0015T963C
http://www.amazon.com/gp/product/B0015T963C
http://www.amazon.com/gp/product/glance/B0015T963C
They always look like either this or this:
This should do it:
None of the above work in all cases. I have tried following urls to match with the examples above:
This is the best I could come up with:
(?:[/dp/]|$)([A-Z0-9]{10})
Which will also select the prepending / in all cases. This can then be removed later on.You can test it on: http://regexr.com/3gk2s