Gettting Actor Ids and biographies from the data d

2019-05-28 02:55发布

Does anyone know the best way of getting Actor Ids from Freebase data dumps, and later on getting the IMDB ids and biographies from the Freebase API?

回答1:

Actors will have the type /film/actor and look like this in the dump:

ns:m.010q36     rdf:type        ns:film.actor.

You can find them all in a few minutes from the compressed dump with a simple grep:

zgrep $'rdf:type\tns:film.actor.' freebase-rdf-<date of dump>.gz | cut -f 1 | cut -d ':' -f 2 > actor-mids.txt

This will generate a list of MIDs in the form m.010q36 which represents the MID /m/010q36.

Using the list of MIDs, look for all lines which have that MID in the first column, one of your desired properties in the second. You could do this using Python, grep, or the tool/language of your choice. Of course if you're using a programming language like Python, you could roll the initial search.

Wikipedia and IMDB IDs are stored as what Freebase calls keys and look like this (MusicBrainz & Netflix included too):

ns:m.010q36     ns:type.object.key      "/wikipedia/en/Mr$002ERodgers".
ns:m.010q36     ns:type.object.key      "/authority/imdb/name/nm0736872".
ns:m.010q36     ns:type.object.key      "/authority/musicbrainz/87467525-3724-412d-ad3e-595ecb6a3bfd".
ns:m.010q36     ns:type.object.key      "/authority/netflix/role/30006685".

Keys may be encoded (like the Wikipedia key above). You can find documentation on the Freebase wiki on how to deal with them.