-->

How to INSERT all triples from an RDFlib graph int

2019-09-05 06:55发布

问题:

This question is related to What URI to use for a Sesame repository while executing a SPARQL ADD query.

I'm trying to INSERT all triples from a Sesame repository into another (Dydra). There are a couple of ways to do it, such as using SERVICE clause or Dydra's GUI. However, Dydra restricts the use of SERVICE and I want an efficient way to insert the data programmatically. This is the code I have right now:

queryStringUpload = 'INSERT {?s ?p ?o} WHERE GRAPH %s {?s ?p ?o}' % dataGraph
    sparql = SPARQLWrapper(dydraSparqlEndpoint)
    sparql.setCredentials(key,key)
    sparql.setQuery(queryStringUpload)
    sparql.method = 'POST'
    sparql.query()

The code results in the following error:

client error: failed to parse after 'GRAPH' at offset 24 on line 1.
INSERT {?s ?p ?o} WHERE GRAPH [a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'IOMemory']]. {?s ?p ?o}
.

Basically, I understand that I'm incorrectly using string formatting. What is the correct way to execute the query?

One way to programmatically do this is by iterating through every triple in dataGraph and individually INSERTing them. I've tried this approach. While the code works, not all of the data is ported. That's the reason I'm looking for a way to bulk port the data.

UPDATE 1

This is the code I tried for implementing the suggested answer:

    sesameURL = 'http://my.ip.ad.here:8080/openrdf-sesame/repositories/rep_name/statements'
payloadPOST = {
        'url': sesameURL,
        # 'account[login]':key,
        # 'account[password]':'',
        # 'csrfmiddlewaretoken':csrfToken_new,
        # 'next':'/',
        }   

        headersPOST = {
        'User-Agent': 'python',
        'Content-Type': 'application/n-quads',
        # 'Referer': dydraLogin,
        }

        paramsPOST = {
        'auth_token': key,
        #'url': sesameURL
        }
        # print payload

        try:
            q = s.post(dydraUrl,data=payloadPOST, params=paramsPOST, headers=headersPOST)
            print "q.text: " + q.text
            print "q_status_code: " + str(q.status_code)
        except requests.exceptions.RequestException as e:
            print e

This is the error:

q_status_code: 400

However, if I comment out the 'url' attribute, I get this:

q_status_code: 201

Any ideas on how to resolve will be very helpful.

UPDATE 2

Now, irrespective of whether 'url' is under headersPOST or paramsPOST, I get the following as output:

q_status_code: 201

However, the data that I want to post doesn't get POSTed. How do I need to do differently?

回答1:

I'm not gonna bother answering why you get that syntax error on that SPARQL update, since it seems immaterial to what you actually want to know. I'm also not going to bother answering how to upload an RDFLib graph to Dydra, since that also seems immaterial to what you want to know. What I'll answer here is how you can upload data from a Sesame store to a Dydra store, programmatically, without having to iterate over all triples, and without use of the SERVICE clause.

Dydra's REST API is basically identical to the Sesame REST API, so most REST operations you can do on a Sesame store you can also execute on a Dydra store.

You can do a HTTP POST request to your Dydra store's REST API URL for statements: repository/<ACCOUNT_ID>/<REPO_ID>/statements (see here in the Dydra docs for more details). Add a parameter url which points to the URL of your source Sesame store URL for statements: (repository/<REPO_ID>/statements). Also make sure you specify a Content-Type HTTP header in your POST request that specifies the MIME-type of an RDF syntax format supported by Dydra (a good pick is something like TriG or N-Quads, since these formats support named graphs).

You don't even need RDFLib for any of this. Presumably you know how to do a simple HTTP request from Python, if not I'm sure there's examples aplenty as it's a fairly generic thing to do.