text file of all titles / topic titles in Freebase

2019-05-26 20:03发布

I need a text file to contain every title / title of each topic / title of each item in a .txt file each on its own line.

How can I do this or make this if I have already downloaded a freebase rdf dump?

If possible, I also need a separate text file with each topic's / item's description on a single line each description on its own line.

How can I do that?

I would greatly appreciate it if someone could help me make either of these files from a Freebase rdf dump.

Thanks in Advance!

1条回答
Bombasti
2楼-- · 2019-05-26 20:28

Filter the RDF dump on the predicate/property ns:type.object.name. If you only want a particular language, also filter by that language e.g. @en.

EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:

  1. English names
  2. English descriptions
  3. a type of /commmon/topic

Combining the three is left as an exercise for the reader.

zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz

It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...

查看更多
登录 后发表回答