I was trying to extract all movies from Linkedmdb. I used OFFSET to make sure I wont hit the maximum number of results per query. I used the following scrip in python
"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
SELECT distinct ?film
WHERE {
?film a movie:film .
} LIMIT 1000 OFFSET %s """ %i
I looped 5 times, with offsets being 0,1000,2000,3000,4000 and recorded the number of results. It was (1000,1000,500,0,0). I already knew the limit was 2500 but I thought by using OFFSET, we can get away with this. Is it no true? There is no way to get all the data (even when we use a loop of some sort)?
Your current query is legal, but but there's no specified ordering, so the offset doesn't bring you to a predictable place in the results. (A lazy implementation could just return the same results over and over again.) When you use limit and offset, you need to also use order by. The SPARQL 1.1 specification says (emphasis added):