Java Bing Image Search

2019-03-04 09:23发布

问题:

I have a small application in java which searches images using bing image search. The problem I am facing is that, its getting only first 20 images. May be because when we search on bing.com it populates first 20 images first and then its an infinite scrolling feature.

Is there any way to search more than 20 images using bing?

Cheers :)

回答1:

I'm guessing this is because this site uses ajax to populate the "infinite" scrolling list as you call it.

You probably send an http request and get the initial page (btw on my browser I got 6 images accross x 4 down, i.e. 24 not 20; thinking about it maybe my client also got 20 only at first and got the last 4 w/ ajax...), and you'd need to do the paging trough by way of ajax requests.

At a glance, the xhtml and associated javascript of the page is very dense and somewhat obfuscated, It would take a while to get oriented... An alternative to analyzing this page is to instead use a packet sniffer (such as wireshark) and to capture the requests which take place when you scroll down.

Essentially this will likely expose some form of ajax request, which you can then easily emulate with java. Typically the ajax response is easy to parse whatever its nature (xml, jason, gzip...).

A possible snags to this well laid out plan is if the returned data in the ajax response is encrypted, for example where the extra images are bundled in some sort of envelope for which you'll then need to discover the format.

Depending on the actual task at hand, you may try alternatives such as automations within GreaseMonkey (on Firefox) or similar tools.

What of Bing API ?
Note that all the above approaches are akin to screen-scraping and hence quite sensitive to even minute changes in the Bing application, and, depending on effective usage and context, this could put the project in a legal grey area... A better approach may be to register and obtain a proper application ID with MS/Bing and to use the Bing API.



回答2:

You are simulating a browser? Doesn't the Bing engine have an entry point for programs instead - a web service or so - which would make your task much easier.


EDIT: SDK appears to be here: http://msdn.microsoft.com/en-us/library/cc980922.aspx



回答3:

Just wanted to post a direct answer to the question: Bing uses Ajax (of course) for the infinite scroll. Each "tick" is based on a simple ajax get request, which accuires new images.

For instance, this url returns 30 results (121-151) in a "htmlraw" format based on the query "max payne". http://www.bing.com/images/async?q=max+payne&format=htmlraw&first=121

Edit: It works with the original url too, just add &first=NUMBER to the querystring. Example: www.bing.com/images/search?q=payne&go=&form=QBLH&scope=images&filt=all&first=10

I am building my own bulk image collector (for a "learning project" for myself) and I found out that it is paginated like this.

FYI, Google and Bing are easy, Yahoo and Altavista (redundant, since their results are from Yahoo) are far more problematic - they don't post the directlink to the original image.

Have fun! :)



回答4:

This can be done by using count parameter. For example, I tried GET "https://api.cognitive.microsoft.com/bing/v7.0/images/search?q=shoes&mkt=en-us&count=30" call and it returns 30 images.