I am querying all my 10 tables to get the user id from them and loading all the user id's into HashSet so that I can have unique user id.
As of now it is sequentially. We go to one table and extract all the user_id from it and load it in hash set and then second and third table and keep going.
private Set<String> getRandomUsers() {
Set<String> userList = new HashSet<String>();
// is there any way to make this parallel?
for (int table = 0; table < 10; table++) {
String sql = "select * from testkeyspace.test_table_" + table + ";";
try {
SimpleStatement query = new SimpleStatement(sql);
query.setConsistencyLevel(ConsistencyLevel.QUORUM);
ResultSet res = session.execute(query);
Iterator<Row> rows = res.iterator();
while (rows.hasNext()) {
Row r = rows.next();
String user_id = r.getString("user_id");
userList.add(user_id);
}
} catch (Exception e) {
System.out.println("error= " + ExceptionUtils.getStackTrace(e));
}
}
return userList;
}
Is there any way to make this multithreaded so that for each table they get the data from my table in parallel? At the end, I need userList
hashset which should have all the unique user id from all the 10 tables.
I am working with Cassandra database and connection is made only once so I don't need to create multiple connections.
You may be able to make it multithreaded but with the overhead of thread creation and multiple connections, you probably won't have significant benefit. Instead, use a UNION statement in mysql and get them all at once. Let the database engine figure out how to get them all efficiently:
Of course, you'll have to programatically create the sql query string. Don't actually put "...." in your query.
If you're able to use Java 8, you could probably do this using
parallelStream
against a list of the tables, and use a lambda to expand the table name into the corresponding list of unique IDs per table, then join the results together into a single hash.Without Java 8, I'd use Google Guava's listenable futures and an executor service something like this:
The use of Executors and Futures is all core Java. The only thing Guava does is let me turn Futures into ListenableFutures. See here for a discussion of why the latter is better.
There are probably still ways to improve the parallelism of this approach, but if the bulk of your time is being spent in waiting for the DB to respond or in processing network traffic, then this approach may help.