I'm using Quickfix/J to receive the message but "Sent test request TEST" happen. The log file (FIX.4.2-A-B.event.log)shows that:
23:19:05: Sent test request TEST
23:19:32: Disconnecting: Timed out waiting for heartbeat
23:19:33: Initiated logon request
23:19:44: Disconnecting: Timed out waiting for logon response
23:19:45: Initiated logon request
23:19:56: Disconnecting: Timed out waiting for logon response ...
But I find something in another log file(FIX.4.2-A-B.message.log):
8=FIX.4.2|9=68|35=1|34=250|49=A|52=20140224-23:19:05.909|56=B|112=TEST|10=106
8=FIX.4.2|9=74|35=0|49=B|56=A|43=N|34=1320|52=20140224-23:19:23.381|112=TEST|10=130
Which apparently indicates that the counter party B has already sent back the heart beat to us.
And the FIX.4.2-A-B.messages.log file is still growing!!!!!!!!!!! The file keep receiving messages but the Quickfix/J process none(nothing happens within onMessage() method)!!!!
Please help to walk me through why this is happening? why the connection is still lost after receiving heartbeat and the log is disconnecting?
Since this problem hasn't been solved yet. Here is the update:
My config:
[default]
FileStorePath=/target/data/fixapplication
ConnectionType=initiator
SenderCompID=A
TargetCompID=B
SocketConnectHost=xxx.xxx.xxx.xx
StartTime=00:00:00
EndTime=00:00:00
HeartBtInt=30
ReconnectInterval=5
FileLogPath=logs/fix/
[session]
BeginString=FIX.4.2
SocketConnectPort=xxx
ValidateFieldsOutOfOrder=N
ResetOnLogon=Y
Here's the codes:
@Override
public void fromAdmin(Message message, SessionID sessionID) throws FieldNotFound,
IncorrectDataFormat, IncorrectTagValue, RejectLogon {
if (adminLog.isInfoEnabled())
adminLog.info("Inside fromAdmin(): " + message.getHeader().getString(MsgType.FIELD)
+ " ; " + message);
}
@Override
public void fromApp(Message message, SessionID sessionID) throws FieldNotFound,
IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType {
crack(message, sessionID);
}
@Override
public void onCreate(SessionID arg0) {
}
@Override
public void onLogon(SessionID session) {
if (adminLog.isInfoEnabled()) {
adminLog.info("Inside onLogon: Logon completed " + session.toString());
}
}
@Override
public void onLogout(SessionID sessionId) {
if (adminLog.isInfoEnabled()) {
adminLog.info("Inside onLogout: Logout completed " + sessionId.toString());
}
}
Basically I let Quickfix/J
to handle the connection itself.
The problem is that the app still keeping receiving messages but not processing them, and the log file shows that it's not connecting. All the messages are in the FIX.4.2-A-B.message.log, that's why it keep growing.
I find some similar cases:
http://www.quickfixj.org/jira/browse/QFJ-668
http://www.quickfixj.org/jira/browse/QFJ-624
http://quickfix-j.364392.n2.nabble.com/Timed-out-waiting-for-heartbeat-td365186.html
http://www.quickfixj.org/jira/browse/QFJ-759
Unfortunately none of them provides solution. So Please help me out.
The second update is as follow:
I extract the log files (I'm A and counter party is B):
8=FIX.4.2|9=58|35=0|34=49|49=A|52=23:23:50.075|56=B|10=030|
8=FIX.4.2|9=58|35=0|34=50|49=A|52=23:24:20.074|56=B|10=019|
**8=FIX.4.2|9=67|35=1|34=51|49=A|52=23:24:46.074|56=B|112=TEST|10=047|**
**8=FIX.4.2|9=74|35=0|49=B|56=A|43=N|34=1103|52=23:24:59.060|112=TEST|10=125|**
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:14.076|56=B|98=0|108=30|141=Y|10=059|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:27.076|98=0|108=30|10=004|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:27.077|112=03/03/2014-07:25:27|10=117|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:26.076|56=B|98=0|108=30|141=Y|10=062|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:39.064|98=0|108=30|10=004|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:39.065|112=03/03/2014-07:25:39|10=120|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:38.076|56=B|98=0|108=30|141=Y|10=065|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:25:51.064|98=0|108=30|10=254|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:25:51.065|112=03/03/2014-07:25:51|10=108|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:25:50.076|56=B|98=0|108=30|141=Y|10=059|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:03.064|98=0|108=30|10=252|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:03.065|112=03/03/2014-07:26:03|10=104|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:02.076|56=B|98=0|108=30|141=Y|10=057|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:15.064|98=0|108=30|10=255|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:15.065|112=03/03/2014-07:26:15|10=110|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:14.076|56=B|98=0|108=30|141=Y|10=060|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:27.064|98=0|108=30|10=002|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:26:27.065|112=03/03/2014-07:26:27|10=116|
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:26:26.076|56=B|98=0|108=30|141=Y|10=063|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:26:39.065|98=0|108=30|10=006|
Please note that it keep trying to logon the session again but fail.
At the mean time, I am sure that the app keep receiving messages from counter party. The proof is that FIX.4.2-A-B.message.log file keeping growing with valid messages(not just heart beat but other valid message). That mean the connection isn't lost.
Why can't I logon the session?
Please help
The third updates:
My initiate successful logs for the logon is as below:
8=FIX.4.2|9=75|35=A|34=1|49=A|52=23:00:07.095|56=B|98=0|108=30|141=Y|10=055|
8=FIX.4.2|9=74|35=A|49=B|56=A|43=N|34=1|52=23:00:20.221|98=0|108=30|10=238|
8=FIX.4.2|9=86|35=1|49=B|56=A|43=N|34=2|52=23:00:20.222|112=07:00:20|10=081|
8=FIX.4.2|9=81|35=0|34=2|49=A|52=23:00:07.301|56=B|112=07:00:20|10=091|
8=FIX.4.2|9=57|35=0|34=3|49=A|52=23:00:38.075|56=B|10=228|
8=FIX.4.2|9=63|35=0|49=B|56=A|43=N|34=26|52=23:00:51.260|10=00
The third line indicates that counter party send me a TEST request and I didn't respond, which seems ok and the connection established.
Should I explicitly handle the response of the TEST request? It seems quickfix/j would handle for me.
Your analysis is generally ok, though here's a closer look at the times for each message:
If there's something strange... in your network...
call the network team?
Something is causing your first disconnect. As the message says it's waiting for a heartbeat that probably doesn't include the 35=0 response to the test request. It's probably waiting for a 35=1 from B.
What kind of messages are filling up the FIX.4.2-A-B.messages.log? Can you post some examples?
the FIX standard heartbeat behavior is as follows:
In your third update, your initiator A did respond to the test request with a heartbeat, and supplied the requested id tag 112:
So that's good. From then on, A supplies a heartbeat 30 seconds later:
But B does not reply for 13 seconds:
Is there some network issue between A and B? What problem are you experiencing?