ZeroMQ - pub / sub latency

2019-05-14 02:29发布

问题:

I'm looking into ZeroMQ to see if it's a fit for a soft-realtime application. I was very pleased to see that the latency for small payloads were in the range of 30 micro-seconds or so. However in my simple tests, I'm getting about 300 micro-seconds.

I have a simple publisher and subscriber, basically copied from examples off the web and I'm sending one byte through localhost.

I've played around for about two days w/ different sockopts and I'm striking out.

Any help would be appreciated!

publisher:

#include <iostream>
#include <zmq.hpp>
#include <unistd.h>

#include <sys/time.h>


int main()
{
    zmq::context_t context (1);
    zmq::socket_t publisher (context, ZMQ_PUB);
    publisher.bind("tcp://*:5556");

    struct timeval timeofday;
    zmq::message_t msg(1);
    while(true)
    {
        gettimeofday(&timeofday,NULL);
        publisher.send(msg);
        std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
        usleep(1000000);
    } 
}  

subscriber:

#include <iostream>
#include <zmq.hpp>
#include <sys/time.h>


int main()
{
    zmq::context_t context (1);
    zmq::socket_t subscriber (context, ZMQ_SUB);
    subscriber.connect("tcp://localhost:5556");
    subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);

    struct timeval timeofday;
    zmq::message_t update;
    while(true)
    {
        subscriber.recv(&update);
        gettimeofday(&timeofday,NULL);
        std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
    }
}

回答1:

First make sure you run producer and consumer on different physical cores (not HT). Second, it depends A LOT on the hardware and OS. Last time I measured kernel IO (4-5 years ago) the results were indeed 10 to 20us around send/recv system calls. You have to optimize your kernel settings to low latency and set TCP_NODELAY.



回答2:

Is the Task Definition real?

Once speaking about *-real-time design, the architecture-capability validation is more important, than the following implementation itself.

If taking your source code as-is, your readings ( which are pitty that were not posted together with your code-snippets for a cross-validation of the replicated MCVE-retest ) will not serve much, as the numbers do not distinguish what portions ( what amounts of time ) were spent on sending-side loop-er, on sending side zmq-data-acquisition/copy/schedulling/wire-level formatting/datagram-dispatch and on receiving side unloading from media/copy/decode/pattern-match/propagate to receiver buffer(s)

If interested in ZeroMQ internals, there are good performance-related application notes available.

If striving for a minimum-latency design do:

  • remove all overheads
    • replace all tcp-header processing from the proposed PUB/SUB channel
    • avoid all non-cardinal logic overheads from processing ( no sense to spend time on subscribe-side ( sure, newer versions of ZMQ have moved into publisher-side filtering, but the idea is clear ) with pattern-matching encoded in the selected archetype processing ( using ZMQ_PAIR avoids any such, independently from the transport class ) - if it is intended to block something, then rather change the signalling socket layout accordingly, so as to principally avoid blocking ( this ought to be a real-time system, as you have said above)
    • apply a "latency-masking" where possible in the target multi-core / many-core hardware architectures so as to squeeze the last drops of spare-time from your hardware / tools capabilities ... benchmark with experiments setups with more I/O-threads' help zmq::context_t context( N );, where N > 1

Missing target:

As Alice in the Wonderlands stated more than a century ago, whenever there was no goal defined, any road leads to the target.

Having a soft-real time ambition, there shan´t be an issue to state a maximum allowed end-to-end latency and from that derive a constraint for transport-layer latency.

Having not done so, 30 us, 300 us or even 3 ms have no meaning per se, so no-one can decide, whether these figures are "enough" for some subsystem or not.

A reasonable next step:

  • define real-time stability horizon(s) ... if using for a real-time control
  • define real-time design constraints ... for signal / data acquisition(s), for processing task(s), for self-diagnostic & control services
  • avoid any blocking, design-wise & validate / prove no blocking will ever appear under all possible real-world operations circumstances [formal proof methods are ready for such task] ( no one would like to see an AlertPanel [ Waiting for data] during your next jet landing or have the last thing to see, before an autonomous car crashes right into the wall, a lovely looking [hour-glass] animated-icon as it moves the sand while the control system got busy, whatever a reason for that was behind it, in a devastatingly blocking manner.

Quantified targets make sense for testing.

If a given threshold permits to have 500 ms stability horizon (which may be a safe value for a slo-mo hydraulic-actuator/control-loop, but may fail to work for a guided missile control system, the less for any [mass&momentum-of-inertia]-less system (alike DSP family of RT-control-systems)), you can test end-to-end if your processing fits in between.

If you know, your incoming data-stream brings about 10 kB each 500 us, you can test your design if it can keep the pace with the burst traffic or not.

If you test, your mock-up design does miss the target (not meeting the performance / time-constrained figures) you know pretty well, where the design or where the architecture needs to get improved.