Reach a string behind unknown value in JSON

2019-07-03 05:24发布

问题:

I use Wikipedia's API to get information about a page. The API gives me JSON like this:

"query":{
  "pages":{
     "188791":{
        "pageid":188791,
        "ns":0,
        "title":"Vanit\u00e9",
        "langlinks":[
           {
              "lang":"bg",
              "*":"Vanitas"
           },
           {
              "lang":"ca",
              "*":"Vanitas"
           },
           ETC.
        }
     }
  }
}

You can see the full JSON response.

I want to obtain all entries like:

{
   "lang":"ca",
   "*":"Vanitas"
}

but the number key ("188791") in the pages object is the problem.

I found Find a value within nested json dictionary in python that explains me how to do enumerate the values.

Unfortunately I get the following exception:

TypeError: 'dict_values' object does not support indexing

My code is:

json["query"]["pages"].values()[0]["langlinks"]

It's probably a dumb question but I can't find a way to pass in the page id value.

回答1:

As long as you're only querying one page at a time, Simeon Visser's answer will work. However, as a matter of good style, I'd recommend structuring your code so that you iterate over all the returned results, even if you know there should be only one:

for page in data["query"]["pages"].values():
    title = page["title"]
    langlinks = page["langlinks"]
    # do something with langlinks...

In particular, by writing your code this way, if you ever find yourself needing to run the query for multiple pages, you can do it efficiently with a single MediaWiki API request.



回答2:

One solution is to use the indexpageids parameter, e.g.: http://fr.wikipedia.org/w/api.php?action=query&titles=Vanit%C3%A9&prop=langlinks&lllimit=500&format=jsonfm&indexpageids. It will add an array of pageids to the response. You can then use that to access the dictionary.



回答3:

You're using Python 3 and values() now returns a dict_values instead of a list. This is a view on the values of the dictionary.

Hence that's why you're getting that error because indexing fails. Indexing is possible in a list but not a view.

To fix it:

list(json["query"]["pages"].values())[0]["langlinks"]


回答4:

If you really want just one page arbitrarily, do that the way Simeon Visser suggested.

But I suspect you want all langlinks in all pages, yes?

For that, you want a comprehension:

[page["langlinks"] for page in json["query"]["pages"].values()]

But of course that gives you a 2D list. If you want to iterate over each page's links, that's perfect. If you want to iterate over all of the langlinks at once, you want to flatten the list:

[langlink for page in json["query"]["pages"] 
 for langlink in page["langlinks"].values()]

… or…

itertools.chain.from_iterable(page["langlinks"] 
                              for page in json["query"]["pages"].values())

(The latter gives you an iterator; if you need a list, wrap the whole thing in list. Conversely, for the first two, if you don't need a list, just any iterable, use parens instead of square brackets to get a generator expression.)