OrientDB automatic pagination returning duplicate

2019-09-14 02:49发布

问题:

I'd like to iterate through a very large set of records in orientdb. So that the result doesn't fill up my machine's memory, I've tried to implement paginated queries, but I seem to be getting back

  • duplicated documents
  • record sets shorter than the page size
  • a infinite series of results

The original Java method listed in the docs is as follows:

OSQLSynchQuery<ODocument> query = new OSQLSynchQuery<ODocument>("select from Customer LIMIT 20");
for (List<ODocument> resultset = database.query(query); !resultset.isEmpty(); resultset = database.query(query)) {
    ...
}

I've implemented this as scala:

val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
var resultset = db.query[OResultSet[ODocument]](query)
while (!resultset.isEmpty()) {
  // process result set here
  resultset = db.query(query)
}

Here's the full example

def makeThing(x:Int) ={
  val doc = new ODocument("Thing")
  doc.field("x",x)
  doc
}

val db: ODatabaseDocumentTx = new ODatabaseDocumentTx("memory:jsondb")
db.create()
db.set(MINIMUMCLUSTERS, 3)
db.set(CLUSTERSELECTION, "round-robin")
db.set(CONFLICTSTRATEGY, "content")
db.set(CHARSET, "UTF-8")


println("SAVING--------")

for (x <- 0 until 12) {
  val doc:ODocument = makeThing(x)
  val saved = db.save[ODocument](doc)
  println(saved)
}


println("\n\nQUERYING--------")

val query = new OSQLSynchQuery[ODocument]("select from Thing LIMIT 5")
var resultset = db.query[OResultSet[ODocument]](query)
while (!resultset.isEmpty()) {
  resultset.toArray.foreach(println)
  resultset = db.query(query)
  println("---------")
}

But here's the output:

SAVING--------
Thing#9:0{x:0} v1
Thing#10:0{x:1} v1
Thing#11:0{x:2} v1
Thing#9:1{x:3} v1
Thing#10:1{x:4} v1
Thing#11:1{x:5} v1
Thing#9:2{x:6} v1
Thing#10:2{x:7} v1
Thing#11:2{x:8} v1
Thing#9:3{x:9} v1
Thing#10:3{x:10} v1
Thing#11:3{x:11} v1



QUERYING--------
Thing#9:0{x:0} v1
Thing#9:1{x:3} v1
Thing#9:2{x:6} v1
Thing#9:3{x:9} v1
Thing#10:0{x:1} v1  # So far, so good...
---------
Thing#9:0{x:0} v1   # Already seen this. I might have expected that last item of the previous set, but not the first. Perhaps I'm supposed to skip this?
Thing#10:1{x:4} v1
Thing#10:2{x:7} v1
Thing#10:3{x:10} v1
Thing#11:0{x:2} v1
---------
Thing#9:0{x:0} v1    # Already seen this
Thing#11:1{x:5} v1
Thing#11:2{x:8} v1
Thing#11:3{x:11} v1  # Page cut short. Is because reached end of my data? Should I detect this as the end?
---------
Thing#9:0{x:0} v1   # Already seen this! Only 1 item again.
---------
Thing#9:1{x:3} v1   # Hmm... longer again. OK...?
Thing#9:2{x:6} v1
Thing#9:3{x:9} v1
Thing#10:0{x:1} v1
Thing#10:1{x:4} v1

... goes on forever...

Note that the DB is in memory, and no-one is simultaneously writing to the DB.

Using ODB client 2.1.19