I have few tables with big amount of data (about 100 million records). So I can't store this data in memory but I would like to stream this result set using java.util.stream
class and pass this stream to another class. I read about Stream.of
and Stream.Builder
operators but they are buffered streams in memory. So is there any way to resolve this question?
Thanks in advance.
UPDATE #1
Okay I googled and found jooq library. I'm not sure but looks like it could be applicable to my test case. To summarize I have few tables with big amount of data. I would like to stream my resultset and transfer this stream to another method. Something like this:
// why return Stream<String>? Because my result set has String type
private Stream<Record> writeTableToStream(DataSource dataSource, String table) {
Stream<Record> record = null;
try (Connection connection = dataSource.getConnection()) {
String sql = "select * from " + table;
try (PreparedStatement pSt = connection.prepareStatement(sql)) {
connection.setAutoCommit(false);
pSt.setFetchSize(5000);
ResultSet resultSet = pSt.executeQuery();
//
record = DSL.using(connection)
.fetch(resultSet).stream();
}
} catch (SQLException sqlEx) {
logger.error(sqlEx);
}
return record;
}
Could please someone advise, am I on correct way? Thanks.
UPDATE #2
I made some experiment on jooq and could say now that above decision is not suitable for me. This code record = DSL.using(connection).fetch(resultSet).stream();
takes too much time
The first thing you have to understand is that code like
does not work as by the time you leave the
try
blocks, the resources are closed while the processing of theStream
hasn’t even started.The resource management construct “try with resources” works for resources used within a block scope inside a method but you are creating a factory method returning a resource. Therefore you have to ensure that the closing of the returned stream will close the resources and the caller is responsible for closing the
Stream
.Further, you need a function which produces an item out of a single line from the
ResultSet
. Supposing, you have a method likeyou may create a
Stream<Record>
basically likeBut to do it correctly you have to incorporate the exception handling and closing of resources. You can use
Stream.onClose
to register an action that will be performed when theStream
gets closed, but it has to be aRunnable
which can not throw checked exceptions. Similarly thetryAdvance
method is not allowed to throw checked exceptions. And since we can’t simply nesttry(…)
blocks here, the program logic of suppression exceptions thrown inclose
, when there is already a pending exception, doesn’t come for free.To help us here, we introduce a new type which can wrap closing operations which may throw checked exceptions and deliver them wrapped in an unchecked exception. By implementing
AutoCloseable
itself, it can utilize thetry(…)
construct to chain close operations safely:With this, the entire operation becomes:
This method wraps the necessary close operation for all resources,
Connection
,Statement
andResultSet
within one instance of the utility class described above. If an exception happens during the initialization, the close operation is performed immediately and the exception is delivered to the caller. If the stream construction succeeds, the close operation is registered viaonClose
.Therefore the caller has to ensure proper closing like
Note that also the delivery of an
SQLException
viaRuntimeException
has been added to thetryAdvance
method. Therefore you may now addthrows SQLException
to thecreateRecord
method without problems.Here is the simplest sample by AbacusUtil.
Disclosure: I'm the developer of AbacusUtil.
I'm going to answer the jOOQ part of your question. As of jOOQ 3.8, there have now been quite a few additional features related to combining jOOQ with Stream. Other usages are also documented on this jOOQ page.
Your suggested usage:
You tried this:
Indeed, this doesn't work well for large result sets because
fetch(ResultSet)
fetches the entire result set into memory and then callsCollection.stream()
on it.Better (lazy) usage:
Instead, you could write this:
... which is essentially convenience for this:
See also
DSLContext.fetchStream(ResultSet)
Of course, you could also let jOOQ execute your SQL string, rather than wrestling with JDBC:
On try-with-resources usage
Do note that a
Stream
produced by jOOQ is "resourceful", i.e. it contains a reference to an openResultSet
(andPreparedStatement
). So, if you really want to return that stream outside of your method, make sure it is closed properly!I'm not aware of any well-known library that will do it for you.
That said, this article shows how to wrap the resultset with an Iterator (ResultSetIterator) and pass it as the first parameter to
Spliterators.spliteratorUnknownSize()
in order to create aSpliterator
.The Spliterator can then be used by
StreamSupport
in order to create a Stream on top of it.Their suggested implementation of
ResultSetIterator
class:and then: