Flickr API returning duplicate photos

2019-03-31 01:45发布

问题:

I've come across a confusing issue with the flickr API.

When I do a photo search (flickr.photos.search) and request high page numbers, I often get duplicate photos returned for different page numbers. Here's three URLs, they should each return three sets of different images, however, they - bizarrely - return the same images:

http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=ca3035f67faa0fcc72b74cf6e396e6a7&tags=gizmo&tag_mode=all&per_page=3&page=6820
http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=ca3035f67faa0fcc72b74cf6e396e6a7&tags=gizmo&tag_mode=all&per_page=3&page=6821
http://api.flickr.com/services/rest/?method=flickr.photos.search&api_key=ca3035f67faa0fcc72b74cf6e396e6a7&tags=gizmo&tag_mode=all&per_page=3&page=6822

Has anyone else come across this? I seem to be able to recreate this on any tag search.

Cheers.

回答1:

After further investigation it seems there's an undocumented "feature" build into the API which never allows you to get more than 4000 photos returned from flickr.photos.search.

So whilst 7444 pages is available, it will only let you load the first 1333.



回答2:

It is possible to retrieve more than 4000 images from flickr; your query has to be paginated by (for example) temporal range such that the total number of images from that query is not more than 4000. You can also use other parameters such as bounding box to limit the total number of images in the response.

For example, if you are searching with the tag 'dogs', this is what you can do ( binary search over time range):

  1. Specify a minimum date and a maximum date in the request url, such as Jan 1st, 1990 and Jan 1st 2015.
  2. Inspect the total number of images in the response. If it is more than 4000, then divide the temporal range into two and work on the first half until you get less than 4000 images from the query. Once you get that, request all the pages from that time range, and move on to the next interval and do the same until (a) Number of required images is met (b) searched all over the initial time interval.


标签: flickr