I'm looking into ZeroMQ to see if it's a fit for a soft-realtime application. I was very pleased to see that the latency for small payloads were in the range of 30 micro-seconds or so. However in my simple tests, I'm getting about 300 micro-seconds.
I have a simple publisher and subscriber, basically copied from examples off the web and I'm sending one byte through localhost.
I've played around for about two days w/ different sockopts
and I'm striking out.
Any help would be appreciated!
publisher:
#include <iostream>
#include <zmq.hpp>
#include <unistd.h>
#include <sys/time.h>
int main()
{
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_PUB);
publisher.bind("tcp://*:5556");
struct timeval timeofday;
zmq::message_t msg(1);
while(true)
{
gettimeofday(&timeofday,NULL);
publisher.send(msg);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
usleep(1000000);
}
}
subscriber:
#include <iostream>
#include <zmq.hpp>
#include <sys/time.h>
int main()
{
zmq::context_t context (1);
zmq::socket_t subscriber (context, ZMQ_SUB);
subscriber.connect("tcp://localhost:5556");
subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);
struct timeval timeofday;
zmq::message_t update;
while(true)
{
subscriber.recv(&update);
gettimeofday(&timeofday,NULL);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
}
}
Is the Task Definition real?
Once speaking about *-real-time design, the architecture-capability validation is more important, than the following implementation itself.
If taking your source code as-is, your readings ( which are pitty that were not posted together with your code-snippets for a cross-validation of the replicated MCVE-retest ) will not serve much, as the numbers do not distinguish what portions ( what amounts of time ) were spent on sending-side loop-er, on sending side zmq-data-acquisition/copy/schedulling/wire-level formatting/datagram-dispatch and on receiving side unloading from media/copy/decode/pattern-match/propagate to receiver buffer(s)
If interested in ZeroMQ internals, there are good performance-related application notes available.
If striving for a minimum-latency design do:
tcp
-header processing from the proposedPUB
/SUB
channelZMQ_PAIR
avoids any such, independently from the transport class ) - if it is intended to block something, then rather change the signalling socket layout accordingly, so as to principally avoid blocking ( this ought to be a real-time system, as you have said above)zmq::context_t context( N );
, where N > 1Missing target:
As Alice in the Wonderlands stated more than a century ago, whenever there was no goal defined, any road leads to the target.
Having a soft-real time ambition, there shan´t be an issue to state a maximum allowed end-to-end latency and from that derive a constraint for transport-layer latency.
Having not done so, 30 us, 300 us or even 3 ms have no meaning per se, so no-one can decide, whether these figures are "enough" for some subsystem or not.
A reasonable next step:
AlertPanel
[ Waiting for data] during your next jet landing or have the last thing to see, before an autonomous car crashes right into the wall, a lovely looking [hour-glass]animated-icon
as it moves the sand while the control system got busy, whatever a reason for that was behind it, in a devastatingly blocking manner.Quantified targets make sense for testing.
If a given threshold permits to have 500 ms stability horizon (which may be a safe value for a slo-mo hydraulic-actuator/control-loop, but may fail to work for a guided missile control system, the less for any [mass&momentum-of-inertia]-less system (alike DSP family of RT-control-systems)), you can test end-to-end if your processing fits in between.
If you know, your incoming data-stream brings about 10 kB each 500 us, you can test your design if it can keep the pace with the burst traffic or not.
If you test, your mock-up design does miss the target (not meeting the performance / time-constrained figures) you know pretty well, where the design or where the architecture needs to get improved.
First make sure you run producer and consumer on different physical cores (not HT). Second, it depends A LOT on the hardware and OS. Last time I measured kernel IO (4-5 years ago) the results were indeed 10 to 20us around send/recv system calls. You have to optimize your kernel settings to low latency and set TCP_NODELAY.