Are there any existing ways of using the freebase data dumps to create a database similar to what freebase offers, but on you own server? Pretty much freebase but locally and not through the API?
I guess it would be possible to create, but are there any existing solutions for this already? Or any alternative solutions for similar data without using an API? I didnt find this for dbpedia either :|
Take a look at the freebase-quad-rdfize project on Google Code. It should allow you to download the weekly Freebase quad dump and load it into the RDF triple store of your choice.
An alternative to freebase-quad-rdfize is here: https://github.com/castagna/freebase2rdf
I use Apache Jena's TDB store to load the RDF data and Fuseki to serve the data via SPARQL protocol over HTTP.
See also:
- http://markmail.org/thread/mq6ylzdes6n7sc5o
- http://markmail.org/thread/jegtn6vn7kb62zof
Moreover, you have now another option: http://basekb.com/
Importing the data into a triple store of your choice wouldn't be hard - but you'll have great difficulties getting any answers out in a reasonable time unless you're doing something trivial.
Someone did import the whole dataset into MySQL a few years ago - it took 2 weeks to load and even simple queries like "the count of things typed as a person" took >1 minute to give an answer. That was on big hardware and the dataset is much bigger now than it was then.
I'm the creator of :BaseKB, the first usable conversion of Freebase to RDF.
There are key integrity problems in the Freebase quad dump that make it hard to get fully correct results from the quad dump. :BaseKB reconstructs the key structure of Freebase so that the unique name assumption holds. This is important, because the ability to write simple SPARQL queries that work like SQL queries depends on this.
Right now, :BaseKB exists in two editions. There's a free edition that consists of 120 million facts about 4 million topics (the ones from Wikipedia) and there's a "Pro" edition that contains everything.
As for the performance issues brought up by Phillip Kendall, I can say that it's mostly a matter of having enough RAM. With 24GB of RAM I can load the free edition into a triple store in an hour. Some queries take longer than I like, but overall query performance is good.
Anyone who wants to use the "Pro" edition is going to need unusually powerful hardware and will spend a good deal of effort getting their toolchain to work. I'm working with partners right now to deliver "Pro" to users in a satisfactory way.
If you can export the database to say, tab delimited or comma seperated values in TXT or database files such as MDB, XLS, or any other highly transportable data format, you'd have no problem building your own MySQL database on your computer using that data. Main thing is making sure you can export data from which you can rebuild your own database from.