I have just downloaded the latest Arduino Library code from Github, and it's broken my MQTT client program. I'm using PubSubClient 1.91 on Arduino, and Mosquitto 1.1.2 (Build 2013-03-07) on Mac OSX. (I also tested against Mosquitto on Windows 7, same problem.)
The supplied Mosquitto clients work fine, (Mac over to Windows, Windows over to Mac) so it's some problem with what's coming from the Arduino end. A wireshark trace shows the Arduino client sending the following data packet:
10:15:ff:ff:4d:51:49:73:64:70:03:02:00:0f:00:07:41:72:64:75:69:6e:6f
And the Mosquitto broker shows:
New connection from 10.0.0.115
Socket read error on client (null), disconnecting.
Before I start to crawl through the MQTT spec, can anyone see anything wrong with the data packet being sent? It's got to be something to do with new Arduino library code...
* Update
Upon further investigation, it appears to be a code generation problem with avr-g++, although life experience tells me it will turn out not to be so. Here is a snippet of code from PubSubClient.cpp
boolean PubSubClient::connect(char *id, char *user, char *pass, char* willTopic, uint8_t willQos, uint8_t willRetain, char* willMessage) {
if (!connected()) {
int result = 0;
if (domain != NULL) {
result = _client->connect(this->domain, this->port);
} else {
result = _client->connect(this->ip, this->port);
}
if (result) {
nextMsgId = 1;
uint8_t d[9] = { 0x00, 0x06, 'M','Q','I','s','d','p',MQTTPROTOCOLVERSION};
// d[0] = 0;
// d[1] = 6;
Serial.print("d[0]="); Serial.println(d[0],HEX);
Now, the result of the Serial.print just above turns out to be 0xFF !!! So, the uint8_t array is not being initialised correctly. @knoleary Your pointer to the bad FF bytes lead me to this.
If I now uncomment the two lines above, and manually initialise the first 2 bytes to 0 and 6, all works fine, and my program communicates happily with Mosquitto.
I've looked at the generated code, but I'm not an Atmel expert.
Does anyone have any clue why this might be?
I'm compiling using the AVR-G++ toolset from Arduino 1.05, in Eclipse.
I'm going for a beer!
OK, I found it. It's a relatively subtle bug. Essentially, when the following line of source code is compiled;
uint8_t d[9] = { 0x00, 0x06, 'M','Q','I','s','d','p',MQTTPROTOCOLVERSION};
the 9 bytes get stored as a constant in the data section of the image. At runtime, a small loop copies the 9 bytes into the array (d[]) By looking at a combined Assembler / source listing, I could see where in the data section the 9 bytes were stored, and then print them out at regular intervals, until I found what was over-writing them. (A bit primitive, I know!)
It turns out the there's a bug in WiFi.cpp , the Arduino WiFi code. Here's the code:
uint8_t WiFiClient::connected() {
if (_sock == 255) {
return 0;
} else {
uint8_t s = status();
return !(s == LISTEN || s == CLOSED || s == FIN_WAIT_1 ||
s == FIN_WAIT_2 || s == TIME_WAIT ||
s == SYN_SENT || s== SYN_RCVD ||
(s == CLOSE_WAIT));
}
}
It turns out the the _sock variable is actually initialised like this:
WiFiClient::WiFiClient() : _sock(MAX_SOCK_NUM) {
}
and MAX_SOCK_NUM is 4, not 255. So, WiFiClient::status returned true, instead of false for an unused Socket.
This method was called by the MQTT Client like this:
boolean PubSubClient::connected() {
boolean rc;
if (_client == NULL ) {
rc = false;
} else {
rc = (int)_client->connected();
if (!rc) _client->stop();
}
return rc;
}
And, since the _client->connected() method erroneously returned true, the _client_stop() method was called. This resulted in a write to a non-existent socket array element, and so overwrote my string data.
@knolleary, I was wondering, is there any specific reason that your PubSubClient::connected() method does a disconnect? I use the ::connected method in a loop, to check that I'm still connected, and, of course it results in my getting a disconnect / reconnect each time round the loop. Any chance we could just make connected return true / false , and handle the disconnect in PuBSubClient::connect?
Nearly one and a half year later I ran into the same problem. Removing the
boolean PubSubClient::connected() {
int rc = (int)_client->connected();
if (!rc) _client->stop();
return rc;
}
the _client->stop() from the connected method of PubSubClient forehand fixed this problem for me. However, I'm not sure whether this is actually a solution or just a very dirty quick hack to localize the problem.
What have you done to fix this problem - your explanation of the problem above is fine however, I was not able to extract the solution easily ;-)