Batch transfer rate upper bound in a channel?

2019-03-06 10:47发布

问题:

We use MQ (7.1.0.3, yes, it's an old version and we are planning for upgrade to v9 soon.....) for Q-Replication, and recently encountered MQ-tier transfer throttle. Specifically, msgs are stuck on XMITQ and cannot get to the other side quick enough. We used default settings for both SDR and RCVR channels for so long, and now realize MQ-tunning probably becomes necessary to deal with increase Q-Replication volume.

We understand a batch is cut when either one of the following conditions is met

  • BATCHSZ (50) reached;
  • BATCHLIM (5000KB) reached;
  • SDR Q empty (This is less likely what we experienced, since XMITQ was backed up pretty high....).

Question is, how often does the MCA on sender side sends out a batch to the other side(without bothering piplines, we are still on v7.1, which seems like doesn't have pipeline feature anyway). Is a batch sent out immediately after it's cut, or it has to wait until the previous batch delivery is completed?

We are trying to estimate what would be the theoretical max MQ-transfer rate, given known network ping time (~20ms) and relatively stable RCVR-side MQ performance.

Btw, it's hosted on RedHat around version 6.8 (can't remember exact version, i'm not the sysadm....).

回答1:

Live post updating our findings

  1. MAX_TRANS might not be accurate and probaly won't help us much. MAX_TRANS controls the rate QCapture puts msg onto sendQ (IBM KC link). Our problem, is MQ transfer rate throttle. With our config "TRANS_BATCH_SZ=1+MAX_TRANS=128+COMMIT_INTERVAL=500+MONITOR_INTERVAL=30000" (all collected from qcapture log, pretty sure those are config in-use), we would expect MQ_MESSAGES no more than around 8K (128 tx/sec * 2 * 30sec), but in fact our IBMQREP_CAPQMON.MQ_MESSAGES constantly surpasses 20K most of the time.... It's also possible those extra messages are results of LOBs, since we use LOB_SEND_OPTION=S, to avoid the hassle handling LOB_TOO_BIG error....
  2. larger than 1 trans_batch_size requires LOB_SEND_OPTION=I (IBM KC link), which is a blocker for us.
  3. [added on Dec.20] An interesting thought occurred to me, if the second batch is not constructed until the previous batck ACK comes back, and assume the following, then the root problem might come from SDR-side.

    • on SDR-side it takes (relatively) fixed time to add a message to batch (eg. 0.95ms);
    • fixed ping-time of 20ms;
    • 0 delay on RCVR-side, meaning RCVR-side immediately sends back ACK.

    Using default BATCHSZ=50, it would take 0.95ms*50+20ms=67.5ms for each batch to complete, yielding about 14.8 batch/sec, or 14.8*50=740 msg/sec;

    Using BATCHSZ=100, it would take 0.95ms*100+20ms=115ms for each batch to complete, yielding about 8.7 batch/sec, or 8.7*100=870 msg/sec.

    Those numbers seem to match our observation/experiment, but need to verify if the assumptions are valid.



标签: ibm-mq