In relation to another post of mine, I realized that there is more we can say on stackoverflow in relation to the Distributed, XA transactions and its internals. Common opinion is that distributed transactions are slow.
What are the XA transactions internals and how can we tune them ?
First lets put some common vocabulary. We have two or more parties
- Transaction Coordinator this is where our business logic resides. This is the side that orchestrates the distributed transaction.
- Transaction Participant (XAResource) this can be any Dababase supporting distributed transactions or some other entity supporting the XA protocol like messaging service.
Lets highlight the main API functions performed during XA transaction.
- start(XID)
- end(XID)
- prepare(XID)
- commit(XID)
The first 2 operations are visible in our source code. This is when we initiate the transaction do some work and then say commit. Once we send the commit message from the source code the Transaction Coordinator and the transaction Participant take over and do some work.
XID parameter is used as a unique key identifying the transaction. Each transaction coordinator and each participant at any time can participate in more than one transaction so this is needed in order to identify them. The XID has two parts one part identifies the global transaction, the second part identifies the participant. This mean that each participant in the same transaction will have its own sub identifier.
Once we reach the transaction prepare phase , each transaction participant writes its work to the transaction log and each Transaction Participant(XARersource) votes if its part is OK or FAILED. Once all votes are received the transaction is committed.
If the power goes down the both the Transaction Coordinator and the Transaction Participant keep their transaction logs durable and can presume their work. If one of the participant vote FAILED during transaction commit then subsequent rollback will be initiated.
Implications in terms of performance
According to the CAP theorem each application(functionality) falls somewhere in between the triangle defined by Consistency, Partitioning and Availability. The main issue with the XA/ Distributed transaction is that it requires extreme consistency.
This requirement results into very high network and disk IO activity.
Disk activity Both the transaction coordinator and the transaction participant need to maintain a transaction log. This log is held on the disk each transaction needs to force information withing this disklog, this information is not buffered information. Having large parallelism will result in high amount of small messages forced to the disk in each transaction log. Normally if we copy one 1GB file from one hard disk to another hard disk this will be very fast operation. If we split the file into 1 000 000 parts of couple of bytes each the file transfer will be extremely slow.
Disk forcing grows with the number of participants.
1 participant is treated as normal transaction
2 participants the
disk forcing is 5
3 equals 7
Network Activity
In order to draw a parallel for distributed XATransaction we need to compare it to something. The network activity during normal transaction is the following. 3 network trips -enlist transaction, send some SQLs, commit.
For a XA transaction it is one idea more complicated. If we have 2 Participants.
We enlist the resources in a transaction 2 network trips. Then we send prepare message another 2 trips then we commit with another 2 trips.
The actual network activity that is happening for 2 resources grows even more the more participants you enlist in the transaction.
The conclusion on how to get a distributed transaction fast
- To do this you need to ensure you have a fast network with minimum latency
- Ensure you have Hard drives with minimum latency and maximum random write speed. A good SSD can do miracle.
-Try to enlist as minimum as possible distributed resources in the transaction
- Try to divide your data into data that has strong requirement for Consistency and Availability (Live data) and data that has low consistency requirements. Live data use Distributed transaction. For offline data use normal transaction, or no transaction if your data does not require it.
My answer is based on what I have read in "XA exposed" (and personal experience) which appears to be no longer available on internet which triggered me to write this.