-->

Retrieving image license and author information in

2020-07-05 03:46发布

问题:

I am trying to use the wikimedia API for wiki commons at:

http://commons.wikimedia.org/w/api.php

It seems like the commons API is very immature and the part at their document that mentions the possibility to retrieve license and author information is empty.

Is there anyway I can retrieve the paragraph that contains the information about the licensing using the API? (For example, the paragraph under the title "Licensing" at this page). Of course I can download the whole page and try to parse it, but what are APIs for?

回答1:

Late answer but you can request the "extmetadata" data with the following query:

http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&titles=File%3aBrad_Pitt_at_Incirlik2.jpg&format=json

Look under imageinfo.extmetadata.UsageTerms, Artist, Credit, etc.



回答2:

You could try using Magnus Manske's Commons API tool on the Wikimedia Toolserver. It's not an official service, and the documentation seem to be rather sparse (that is to say, almost nonexistent), but the XML output seems pretty self-explanatory.

I can't seem to find the source for Magnus's script anywhere, but I assume it extracts the licensing information from the categories the file belongs to. If you wanted, you could do that yourself: just fetch the list of categories and, if necessary, walk up the category tree until you find a license category you recognize. Alas, the tree-walking part requires either multiple API requests or a database of Commons categories (either live access on the Toolserver, or a reconstructed copy from the database dumps).

Yes, I realize that this answer may seem unsatisfactory. The fact is that Magnus's script seems to be the closest currently existing thing to what you want, and even it's marked as experimental and incomplete. Basically, this is a problem waiting for someone to implement a (better) solution.



回答3:

I've used Magnus' Commons API tool. It's not designed to be just dropped into a project, but if you copy the source of the wiki page it calls and cache it locally, then move the logic into a class you can make it more easily callable. Here's the source for Magnus' version. If you want the class I created from it let me know and I'll dig it out.



回答4:

From http://www.mediawiki.org/wiki/API_talk:Main_page#Image_license_information Is there a way to get the license of an image through the api? By category is probably easiest, assuming the site categorizes by license. There is no built in module though for license information. Splarka 08:45, 22 January 2010 (UTC)

However, I find that using categories doesn't return anything for many images even though they have a license specified. Maybe the best way is to parse the rendered html of the image page.



回答5:

have a look at Mediawiki and try this function:

import json, requests
def extract_image_license(image_name):

    start_of_end_point_str = 'https://commons.wikimedia.org' \
                         '/w/api.php?action=query&titles=File:'
    end_of_end_point_str = '&prop=imageinfo&iiprop=user' \
                       '|userid|canonicaltitle|url|extmetadata&format=json'
    result = requests.get(start_of_end_point_str + image_name+end_of_end_point_str)
    result = result.json()
    page_id = next(iter(result['query']['pages']))
    image_info = result['query']['pages'][page_id]['imageinfo']

    return image_info

then you call the function and pass in the image name you want to query for example:

extract_image_license('Albert_Einstein_Head.jpg')


回答6:

see page: http://www.mediawiki.org/wiki/API:Meta

You can use foreach image the tag 'meta=siteinfo' and the tag 'siprop=rightsinfo' (siprop is the prop of the siteinfo) Then you will see the rightsinfo of the picture.

In your case of Brad Pitt it would be like:

http://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:Brad_Pitt_at_Incirlik2.jpg&prop=imageinfo&iiprop=url&meta=siteinfo&siprop=rightsinfo