SimpleJson handling of same named entities

2019-01-12 07:25发布

I'm using the Alchemy API in app engine so I'm using the simplejson library to parse responses. The problem is that the responses have entries that have the sme name

 {
    "status": "OK",
    "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html",
    "url": "",
    "language": "english",
    "entities": [
        {
            "type": "Person",
            "relevance": "0.33",
            "count": "1",
            "text": "Michael Jordan",
            "disambiguated": {
                "name": "Michael Jordan",
                "subType": "Athlete",
                "subType": "AwardWinner",
                "subType": "BasketballPlayer",
                "subType": "HallOfFameInductee",
                "subType": "OlympicAthlete",
                "subType": "SportsLeagueAwardWinner",
                "subType": "FilmActor",
                "subType": "TVActor",
                "dbpedia": "http://dbpedia.org/resource/Michael_Jordan",
                "freebase": "http://rdf.freebase.com/ns/guid.9202a8c04000641f8000000000029161",
                "umbel": "http://umbel.org/umbel/ne/wikipedia/Michael_Jordan",
                "opencyc": "http://sw.opencyc.org/concept/Mx4rvViVq5wpEbGdrcN5Y29ycA",
                "yago": "http://mpii.de/yago/resource/Michael_Jordan"
            }
        }
    ]
}

So the problem is that the "subType" is repeated so the dict that a loads returns is just "TVActor" rather than a list. Is there anyway to go around this?

1条回答
beautiful°
2楼-- · 2019-01-12 07:51

The rfc 4627 that defines application/json says:

An object is an unordered collection of zero or more name/value pairs

And:

The names within an object SHOULD be unique.

It means that AlchemyAPI should not return multiple "subType" names inside the same object and claim that it is a JSON.

You could try to request the same in XML format (outputMode=xml) to avoid ambiguity in the results or to convert duplicate keys values into lists:

import simplejson as json
from collections import defaultdict

def multidict(ordered_pairs):
    """Convert duplicate keys values to lists."""
    # read all values into lists
    d = defaultdict(list)
    for k, v in ordered_pairs:
        d[k].append(v)

    # unpack lists that have only 1 item
    for k, v in d.items():
        if len(v) == 1:
            d[k] = v[0]
    return dict(d)

print json.JSONDecoder(object_pairs_hook=multidict).decode(text)

Example

text = """{
  "type": "Person",
  "subType": "Athlete",
  "subType": "AwardWinner"
}"""

Output

{u'subType': [u'Athlete', u'AwardWinner'], u'type': u'Person'}
查看更多
登录 后发表回答