Developing a Java application that share a single Connection between mutiple threads, the problem of concurrency arise.
If thread A updates record 1 in table T, and simultaneously thread B issues a SELECT on record 1 in table T, how do I ensure thread B reads the updated values of thread A?
java.sql.Connection offers transactions with begin(), commit() and rollback(), but does this process also cover data correctness?
I think I'm missing something.
Two points:
- You shouldn't share a
jdbc.Connection
between threads, at least for any 'seriously production' code, see here. For demo purposes, I think, sharing a Connection is OK;
- If a thread reads from DB after relevant DB transaction is committed, it will see data written by another thread.
For your second question
will thread B timeout until the first transaction has commit() or rollback()
-- B
will block till A
tx is finished (either by commit or rollback) if:
B
tries to update/delete same table row which is being updated by A
, and ...
A
updates that row under DB-level lock, using SELECT ... FOR UPDATE
.
You can get this behavior using two consoles (for example, with PostgreSQL psql
), each console stands for a thread:
in A
console type following:
BEGIN;
SELECT some_col FROM some_tbl WHERE some_col = some_val FOR UPDATE;
now in B
console type:
BEGIN;
UPDATE some_tbl SET some_col = new_val WHERE some_col = some_val;
You should see that UPDATE
blocks until in A
you do either COMMIT
or ROLLBACK
.
Above explanation uses separate DB connections, just like Java JDBC connection pool. When you share single connection between Java threads, I think, any interaction with DB will block if connection is used by some other thread.
Jdbc is a standard that is broadly adopted but with uneven levels of adherence, it is probably not good to make sweeping statements about what is safe.
I would not expect there is anything to keep statement executions and commits and rollbacks made from multiple threads from getting interleaved. Best case, only one thread can use the connection at a time and the others block, making multithreading useless.
If you don't want to provide a connection to each thread, you could have the threads submit work items to a queue that is consumed by a single worker thread handling all the jdbc work. But it's probably less impact on existing code to introduce a connection pool.
In general if you have concurrent updates and reads then they happen in the order that they happen. Locking and isolation levels provide consistency guarantees for concurrent transactions but if one hasn't started its transaction yet those aren't applicable. You could have a status flag, version number, or time stamp on each row to indicate when an update occurred.
If you have a lot of updates it can be better to collect them in a flat file and execute a bulk copy. It can be much faster than using jdbc. Then with updates out of the way execute selects in jdbc.