Weird behavior of Quickfix/j after “Sent test requ

2019-02-25 03:17发布

I'm using Quickfix/J to receive the message but "Sent test request TEST" happen. The log file (FIX.4.2-A-B.event.log)shows that:

23:19:05: Sent test request TEST   
23:19:32: Disconnecting: Timed out waiting for heartbeat   
23:19:33: Initiated logon request   
23:19:44: Disconnecting: Timed out waiting for logon response   
23:19:45: Initiated logon request   
23:19:56: Disconnecting: Timed out waiting for logon response  ...

But I find something in another log file(FIX.4.2-A-B.message.log):

8=FIX.4.2|9=68|35=1|34=250|49=A|52=20140224-23:19:05.909|56=B|112=TEST|10=106 
8=FIX.4.2|9=74|35=0|49=B|56=A|43=N|34=1320|52=20140224-23:19:23.381|112=TEST|10=130 

Which apparently indicates that the counter party B has already sent back the heart beat to us.

And the FIX.4.2-A-B.messages.log file is still growing!!!!!!!!!!! The file keep receiving messages but the Quickfix/J process none(nothing happens within onMessage() method)!!!!

Please help to walk me through why this is happening? why the connection is still lost after receiving heartbeat and the log is disconnecting?


Since this problem hasn't been solved yet. Here is the update:

My config:

[default]
FileStorePath=/target/data/fixapplication
ConnectionType=initiator
SenderCompID=A
TargetCompID=B
SocketConnectHost=xxx.xxx.xxx.xx
StartTime=00:00:00
EndTime=00:00:00
HeartBtInt=30
ReconnectInterval=5
FileLogPath=logs/fix/

[session]
BeginString=FIX.4.2
SocketConnectPort=xxx
ValidateFieldsOutOfOrder=N
ResetOnLogon=Y

Here's the codes:

@Override
    public void fromAdmin(Message message, SessionID sessionID) throws FieldNotFound,
            IncorrectDataFormat, IncorrectTagValue, RejectLogon {
        if (adminLog.isInfoEnabled())
            adminLog.info("Inside fromAdmin(): " + message.getHeader().getString(MsgType.FIELD)
                    + " ; " + message);
    }

    @Override
    public void fromApp(Message message, SessionID sessionID) throws FieldNotFound,
            IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType {

        crack(message, sessionID);
    }

    @Override
    public void onCreate(SessionID arg0) {

    }

    @Override
    public void onLogon(SessionID session) {
        if (adminLog.isInfoEnabled()) {
            adminLog.info("Inside onLogon: Logon completed " + session.toString());
        }
    }

    @Override
    public void onLogout(SessionID sessionId) {
        if (adminLog.isInfoEnabled()) {
            adminLog.info("Inside onLogout: Logout completed " + sessionId.toString());
        }
    }

Basically I let Quickfix/J to handle the connection itself.

The problem is that the app still keeping receiving messages but not processing them, and the log file shows that it's not connecting. All the messages are in the FIX.4.2-A-B.message.log, that's why it keep growing.

I find some similar cases:

http://www.quickfixj.org/jira/browse/QFJ-668

http://www.quickfixj.org/jira/browse/QFJ-624

http://quickfix-j.364392.n2.nabble.com/Timed-out-waiting-for-heartbeat-td365186.html

http://www.quickfixj.org/jira/browse/QFJ-759

Unfortunately none of them provides solution. So Please help me out.


The second update is as follow:

I extract the log files (I'm A and counter party is B):

8=FIX.4.2|9=58|35=0|34=49|49=A|52=23:23:50.075|56=B|10=030|
8=FIX.4.2|9=58|35=0|34=50|49=A|52=23:24:20.074|56=B|10=019|
**8=FIX.4.2|9=67|35=1|34=51|49=A|52=23:24:46.074|56=B|112=TEST|10=047|**
**8=FIX.4.2|9=74|35=0|49=B|56=A|43=N|34=1103|52=23:24:59.060|112=TEST|10=125|**
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:14.076|56=B|98=0|108=30|141=Y|10=059|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:27.076|98=0|108=30|10=004|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:27.077|112=03/03/2014-07:25:27|10=117|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:26.076|56=B|98=0|108=30|141=Y|10=062|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:39.064|98=0|108=30|10=004|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:39.065|112=03/03/2014-07:25:39|10=120|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:38.076|56=B|98=0|108=30|141=Y|10=065|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:51.064|98=0|108=30|10=254|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:51.065|112=03/03/2014-07:25:51|10=108|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:50.076|56=B|98=0|108=30|141=Y|10=059|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:03.064|98=0|108=30|10=252|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:03.065|112=03/03/2014-07:26:03|10=104|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:02.076|56=B|98=0|108=30|141=Y|10=057|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:15.064|98=0|108=30|10=255|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:15.065|112=03/03/2014-07:26:15|10=110|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:14.076|56=B|98=0|108=30|141=Y|10=060|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:27.064|98=0|108=30|10=002|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:27.065|112=03/03/2014-07:26:27|10=116|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:26.076|56=B|98=0|108=30|141=Y|10=063|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:39.065|98=0|108=30|10=006|

Please note that it keep trying to logon the session again but fail.

At the mean time, I am sure that the app keep receiving messages from counter party. The proof is that FIX.4.2-A-B.message.log file keeping growing with valid messages(not just heart beat but other valid message). That mean the connection isn't lost.

Why can't I logon the session?

Please help


The third updates:

My initiate successful logs for the logon is as below:

8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:00:07.095|56=B|98=0|108=30|141=Y|10=055|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:00:20.221|98=0|108=30|10=238|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:00:20.222|112=07:00:20|10=081|
8=FIX.4.2|9=81|35=0|34=2|49=A|52=23:00:07.301|56=B|112=07:00:20|10=091|
8=FIX.4.2|9=57|35=0|34=3|49=A|52=23:00:38.075|56=B|10=228|
8=FIX.4.2|9=63|35=0|49=B|56=A|43=N|34=26|52=23:00:51.260|10=00

The third line indicates that counter party send me a TEST request and I didn't respond, which seems ok and the connection established.

Should I explicitly handle the response of the TEST request? It seems quickfix/j would handle for me.

2条回答
三岁会撩人
2楼-- · 2019-02-25 04:00

Your analysis is generally ok, though here's a closer look at the times for each message:

23:19:05: Sent test request TEST  
23:19:05.909: Sent the 35=1 with 112=TEST
23:19:23.381: Received the 35=0 with 112=TEST
23:19:32: Disco timeout "waiting for heartbeat"
23:19:44: Disco timeout "waiting for logon response"

If there's something strange... in your network...

call the network team?

Something is causing your first disconnect. As the message says it's waiting for a heartbeat that probably doesn't include the 35=0 response to the test request. It's probably waiting for a 35=1 from B.

What kind of messages are filling up the FIX.4.2-A-B.messages.log? Can you post some examples?

查看更多
贪生不怕死
3楼-- · 2019-02-25 04:11

the FIX standard heartbeat behavior is as follows:

1. On Logon, the Initiator requests a heartbeat interval (usual default 30 seconds)
2. From now on every side expects at least one message every 30 seconds (defined heartbeat interval)
3. If there is no application message available, a heartbeat is sent instead
4. If one side does not get neither a heartbeart nor an application message after 30 seconds, a connection issue is suspected.
5. Therefore, a TestRequest is sent ("Are you still there?"): Sent test request TEST
6. This TestRequest is not answered by a Heartbeat with the supplied TestRequestID, the connection is considered dead.
7. Finally, the network connection is dropped: Disconnecting: Timed out waiting for heartbeat

In your third update, your initiator A did respond to the test request with a heartbeat, and supplied the requested id tag 112:

8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:00:20.222|112=07:00:20|10=081|
8=FIX.4.2|9=81|35=0|34=2|49=A|52=23:00:07.301|56=B|112=07:00:20|10=091|

So that's good. From then on, A supplies a heartbeat 30 seconds later:

8=FIX.4.2|9=57|35=0|34=3|49=A|52=23:00:38.075|56=B|10=228|

But B does not reply for 13 seconds:

8=FIX.4.2|9=63|35=0|49=B|56=A|43=N|34=26|52=23:00:51.260|10=00

Is there some network issue between A and B? What problem are you experiencing?

查看更多
登录 后发表回答