When I need to read millions of database rows from a PostgreSQL database using the JDBC driver, I always use a cursor, otherwise I will get an OutOfMemoryError. Here is the pattern (pseudocode) that I use:
begin transaction
execute("declare cursor...")
while (true) {
boolean processedSomeRows = false
resultSet = executeQuery("fetch forward...")
while (resultSet.next()) {
processedSomeRows = true
...
}
if (!processedSomeRows) break
}
close cursor
commit
This is a more "functional" equivalent that I came up with to be implemented in Scala:
begin transaction
execute("declare cursor...")
@tailrec
def loop(resultSet: ResultSet,
processed: Boolean): Boolean = {
if (!resultSet.next()) processed
else {
// Process current result set row
loop(resultSet, true)
}
}
while (loop(executeQuery("fetch forward..."), false))
; //Empty loop
close cursor
commit
I know this is contrived, but is there a better way without resorting to mutability? If I were trying to do this in Haskell, I might come up with a solution that involves monads, but I don't want to send my mind down those "twisty little passages, all alike," because it might never return...