Reverse image scraping with PHP

2019-05-06 06:18发布

问题:

I need to fetch some images using google Reverse image search which is not supported by the API, but thankfully, you can query google with a direct link to the image and it still shows results, so:

$googleURL = "https://www.google.com/searchbyimage?&image_url=".$imageURL;
echo $googleURL;

Output:

https://www.google.com.au/search?tbs=sbi:AMhZZiu9rNRW4ETWGjN9XYQKsa21UHM7j_1TjMjXvYyNH1knVTyMGZGNmS2yme4CsQb0T7UViTyNrG4e8u_1xLY-dZCU16wkfdUakeY7idDwyMge78nT--Grpll4t9_1fp4YPTsJyKRUANzw1Iyctsko7OZbkYES3VUHtyNy9l9RJf12YOdEvVOxSZCO6-JPxO0PpZ5p79Rr-eDUrqENWYVbk4qojafKMTVfuXvoACQ9iykI-DMVbP9n_1o0YkdKTdUeK2r30wg4Oe2BqspoXlI_11rxySuK6TolPM6z58E6erTT0bnYfXTlyDMBfOwgSfhbn2ipLrNHgNdqyk-YhmMP0_1ZzqVyZrgMz-I5cfH9N65nX6bhZfos0lgr8_15V6ZHtX0_1p8s5r229JDrwzlwnjwOBLgP1inmEORCaKOlcfHbyPnU3n04pIfLGu5fWYpbmFJwtK_1vaJvS0uFb6Pkh_1uv0wvz_10yf4O6E1IvBSoMudcYy4cmJ1zegJJ9L50C0bzXFIRUb62lcPJWbkZNR44Tz378nOSXd-PND0JfKQ-TujT3KfC_1O241knvr9Eb3LbuvncGiCMoPgxlUY4r9B_1KWchNWhJVTJz9omeiygwz5K_13YkjuLg52UF6YWvLedCxgRoUpuj9kFdmYt-b9Tn2VEZG8yfiLm3OTkZnlVYtPF87LLQAHH24VpLMoV0oDllHDK3xOXhvusl_1K2Me9tTdK15PPG7oreeWfYRztQwTpG4iB5GAnaj687OQukvxX5hNFIqXx_1QSuNooDhIP1eJl-6QYfuI4MPasj6flSMom7HYTSjyjcsQKw0Prj1bBsJY6qH1qyLrF1f1_1Ql0COERnbOV7O5mTOuTkNWarmR5wzE06qbgsrtT95ENqafd81ppHbA0Jyg-xQ8TLV-dSp1QDAtiYAHI_11tCwsDtrak4jDS4qAfEJCw_1lb9urJqqajvp25jLH2_1mN3u0eeW7xNF-PljofyhI0iIWYSg6ghyOVRIaT_1c6klKUPvOrquZy8hMCZWHb3CYZNGJeKTnACCyYW1MNVUsYnoFWORN6hvkVlUk0beFXvA_1W2vaoedLjj-fN1y8_1dPOiBROLYtv85nq01csCKk7Eib6p2b_131wEeQBYocoYU0sGTv2_1dhOvSXRPGTnrbZlNDbJFUtH4pF9tMQj5-Fh_1lw9TTXGCjQ9UjOSLD5q7tNjCQU1As1uCQBvmZvxo7J3gZSAcj_19wXfHZCOsA8g-WA97V-2b62ia4RFOehQ38hoXoK7MCSDLnVtJTsKQz9HuEreXm8qGQlbDzfr7JFuHHe2MOyChwnL_1gzRnZd8uv2OIM0nzKh_1wg4T1KCXv3NSGNkSyNxpYXFJ161Sv3NpQQI3epBMiYA_1AcQDiCxOTQvWj00e5EXaXN22CDRWRq3uk4HWj2eXcR6-TGmsYEfSGX9nyQwK1DHp9yaNjk9Bal7rNHUAe_1eMDsCWW9htaLyiMTio0eXyTumVrlt7ShZVd8oSPOj8U0ilY9owH95jz7LsI8vUnzF-FC2m_1yNt3xe4ZAcsRTbYQXTN3Ga76vTQBPu8oz0gkYmDTA&gws_rd=cr&ei=wAHVVJOVLIeeugSZ64A4

.. now on this page, I need to follow the link to the actual results page, so my condition would look like:

if a.text == 'Large' 
elseif a.text == 'Medium'
elseif a.text == 'Visually similar images'{
    // crawl the link
    // get direct links of top 10 results  
}

But I'm not sure how to:

  1. get the href if the condition a.text == 'Large' is met since Simple HTML DOM Parser or PHPQuery neither have this like jQuery.
  2. On fetching the results page, how to trigger a mousedown even to get the full-size image URLS because this is what I see in the source: jsaction="mousedown:irc.rl;keydown:irc.rlk"

Here's quick screencast of what I'm looking to do: https://www.dropbox.com/s/c8g7fs5m5zqcegb/2015-02-07_08-56-23.mp4?dl=0 (5.9mb)

回答1:

  1. You can use Regular Expresssions to find the matching link.
  2. If you check the HTML Code of the Google Image Search a bit more closely, you see there is actually also a href param in the Link (Example) which you can follow with just another crawl. There you can parse out the large image again with Regular Expressions.