I'm trying to parse several big graphs with RDFLib 3.0, apparently it handles first one and dies on the second (MemoryError)... looks like MySQL is not supported as store anymore, can you please suggest a way to somehow parse those?
Traceback (most recent call last):
File "names.py", line 152, in <module>
main()
File "names.py", line 91, in main
locals()[graphname].parse(filename, format="nt")
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/graph.py", line 938, in parse
location=location, file=file, data=data, **args)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/graph.py", line 757, in parse
parser.parse(source, self, **args)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/nt.py", line 24, in parse
parser.parse(f)
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/ntriples.py", line 124, in parse
self.line = self.readline()
File "/usr/local/lib/python2.6/dist-packages/rdflib-3.0.0-py2.6.egg/rdflib/plugins/parsers/ntriples.py", line 151, in readline
m = r_line.match(self.buffer)
MemoryError
How many triples on those RDF files ? I have tested
rdflib
and it won't scale much further than few tens of ktriples - if you are lucky. No way it really performs well for files with millions of triples.The best parser out there is
rapper
from Redland Libraries. My first advice is to not useRDF/XML
and go forntriples
. Ntriples is a lighter format than RDF/XML. You can transform from RDF/XML to ntriples usingrapper
:rapper -i rdfxml -o ntriples YOUR_FILE.rdf > YOUR_FILE.ntriples
If you like Python you can use the Redland python bindings:
I have parsed fairly big files (couple of gigabyes) with redland libraries with no problem.
Eventually if you are handling big datasets you might need to assert your data into a scalable triple store, the one I normally use is 4store. 4store internally uses redland to parse RDF files. In the long term, I think, going for a scalable triple store is what you'll have to do. And with it you'll be able to use SPARQL to query your data and SPARQL/Update to insert and delete triples.