I want to programatically find a list of URLs for similar images given an image URL. I can't find any free image search APIs so I'm trying to do this by scraping Google's Search by Image.
If I have an image URL, say http://i.imgur.com/oLmwq.png, then navigating to https://www.google.com/searchbyimage?&image_url=http://i.imgur.com/oLmwq.png gives related images and info.
How do I get jsdom.env
to produce the HTML your browser gets from the above URL?
Here's what I've tried (CoffeeScript):
jsdom = require 'jsdom'
url = 'https://www.google.com/searchbyimage?&image_url=http://i.imgur.com/oLmwq.png'
jsdom.env
html: url
scripts: [ "http://code.jquery.com/jquery.js" ]
features:
FetchExternalResources: ['script']
ProcessExternalResources: ['script']
done: (errors, window) ->
console.log window.$('body').html()
You can see the HTML doesn't match what we want. Is this an issue with Jsdom's HTTP headers?
I find request + cheerio to be easier than jsdom for tasks like this. I see that you've found an answer already, but thought I'd mention it as an alternative solution.
Example:
The issue is Jsdom's User-Agent HTTP header. Once that is set everything (almost) works:
Which gives us a nice list of visually similar images. The only problem now is Jsdom throws an error after returning the result: