scrape ASIN from amazon URL using javascript

Assuming I have an Amazon product URL like so

http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C/ref=amb_link_86123711_2?pf_rd_m=ATVPDKIKX0DER&pf_rd_s=center-1&pf_rd_r=0AY9N5GXRYHCADJP5P0V&pf_rd_t=101&pf_rd_p=500528151&pf_rd_i=507846

How could I scrape just the ASIN using javascript? Thanks!

标签： javascript screen-scraping amazon-ec2

11条回答

我命由我不由天

2楼-- · 2019-03-08 06:44

@Gumbo: Your code works great!

//JS Test: Test it into firebug.

url = window.location.href;
url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");

I add a php function that makes the same thing.

function amazon_get_asin_code($url) {
    global $debug;

    $result = "";

    $pattern = "([a-zA-Z0-9]{10})(?:[/?]|$)";
    $pattern = escapeshellarg($pattern);

    preg_match($pattern, $url, $matches);

    if($debug) {
        var_dump($matches);
    }

    if($matches && isset($matches[1])) {
        $result = $matches[1];
    } 

    return $result;
}

0人赞添加讨论(0) 举报

该账号已被封号

3楼-- · 2019-03-08 06:44

this is my universal amazon ASIN regexp:

~(?:\b)((?=[0-9a-z]*\d)[0-9a-z]{10})(?:\b)~i

0人赞添加讨论(0) 举报

看我几分像从前

4楼-- · 2019-03-08 06:46

The Wikipedia article on ASIN (which I've linkified in your question) gives the various forms of Amazon URLs. You can fairly easily create a regular expression (or series of them) to fetch this data using the match() method.

0人赞添加讨论(0) 举报

smile是对你的礼貌

5楼-- · 2019-03-08 06:49

A little bit of change to the regex of the first answer and it works on all the urls I have tested.

var url = "http://www.amazon.com/Kindle-Wireless-Reading-Display-Generation/dp/B0015T963C";
m = url.match("/([a-zA-Z0-9]{10})(?:[/?]|$)");;
print(m);
if (m) { 
    print("ASIN=" + m[1]);
}

0人赞添加讨论(0) 举报

戒情不戒烟

6楼-- · 2019-03-08 06:51

Actually, the top answer doesn't work if it's something like amazon.com/BlackBerry... (since BlackBerry is also 10 characters).

One workaround (assuming the ASIN is always capitalized, as it always is when taken from Amazon) is (in Ruby):

        url.match("/([A-Z0-9]{10})")

I've found it to work on thousands of URLs.

0人赞添加讨论(0) 举报

看我几分像从前

7楼-- · 2019-03-08 06:52

something like this should work (not tested)

var match = /\/dp\/(.*?)\/ref=amb_link/.exec(amazon_url);
var asin = match ? match[1] : '';

0人赞添加讨论(0) 举报

1 2 下一页

scrape ASIN from amazon URL using javascript

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间