scrape ASIN from amazon URL using javascript

2019-03-08 06:23发布

Assuming I have an Amazon product URL like so

http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=0AY9N5GXRYHCADJP5P0V&pf_rd_t=101&pf_rd_p=500528151&pf_rd_i=507846

How could I scrape just the ASIN using javascript? Thanks!

11条回答
我命由我不由天
2楼-- · 2019-03-08 06:44

@Gumbo: Your code works great!

//JS Test: Test it into firebug.

url = window.location.href;
url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");

I add a php function that makes the same thing.

function amazon_get_asin_code($url) {
    global $debug;

    $result = "";

    $pattern = "([a-zA-Z0-9]{10})(?:[/?]|$)";
    $pattern = escapeshellarg($pattern);

    preg_match($pattern, $url, $matches);

    if($debug) {
        var_dump($matches);
    }

    if($matches && isset($matches[1])) {
        $result = $matches[1];
    } 

    return $result;
}
查看更多
该账号已被封号
3楼-- · 2019-03-08 06:44

this is my universal amazon ASIN regexp:

~(?:\b)((?=[0-9a-z]*\d)[0-9a-z]{10})(?:\b)~i
查看更多
看我几分像从前
4楼-- · 2019-03-08 06:46

The Wikipedia article on ASIN (which I've linkified in your question) gives the various forms of Amazon URLs. You can fairly easily create a regular expression (or series of them) to fetch this data using the match() method.

查看更多
smile是对你的礼貌
5楼-- · 2019-03-08 06:49

A little bit of change to the regex of the first answer and it works on all the urls I have tested.

var url = "http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C";
m = url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");;
print(m);
if (m) { 
    print("ASIN=" + m[1]);
}

查看更多
戒情不戒烟
6楼-- · 2019-03-08 06:51

Actually, the top answer doesn't work if it's something like amazon.com/BlackBerry... (since BlackBerry is also 10 characters).

One workaround (assuming the ASIN is always capitalized, as it always is when taken from Amazon) is (in Ruby):

        url.match("/([A-Z0-9]{10})")

I've found it to work on thousands of URLs.

查看更多
看我几分像从前
7楼-- · 2019-03-08 06:52

something like this should work (not tested)

var match = /\/dp\/(.*?)\/ref=amb_link/.exec(amazon_url);
var asin = match ? match[1] : '';
查看更多
登录 后发表回答