i was trying to import freebase rdf to google refine but getting an error....but now how to extract topic names with notable type from 18 gb rdf to csv etc....any gui tool ?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
146 GB is too big for OpenRefine (ex-Google Refine) to handle. If there is a GUI tool that will do this out of the box, I'm not familiar with it, but since this is a programming Q & A site, I'll give a shell programming solution. You don't need to know anything about Linux, but you do need to know how to use Unix shell commands (you could use Cygwin on Windows).
curl -L http://download.freebaseapps.com | gunzip | egrep 'notable_for|notable_type|rdfs:label'
will give you all the raw data that you need to assemble the solution. The lines with the key information look like this, but if you just want labels/names, you'll need to substitute them for the subject/object IDs in the first and last colum.
ns:m.01nsxs2 ns:common.topic.notable_types ns:m.0kpv17.