-->

Emulating network disconnects to locally test dist

2020-07-13 10:06发布

问题:

I have several instances of a distributed application running on the localhost; every instance communicate with others through certain ports, all instances together make an ensemble. (I'm actually talking about ZooKeeper, running on Linux)

Now I want to write unit tests to emulate ensemble partitioning.
E.g. I have 5 instances, and I want to split them into two groups of 3 and 2 so that an instance from one group couldn't communicate to an instance from another group. It would emulate real-world situation when 3 machines are in one datacenter, 2 machines are in another one, and datacenters become partitioned.

The problem is essentially to make a socket work selectively: speak to one socket, but don't speak to another. One solution that comes to mind is to abstract communication layer and inject testing rules into it (in the form of "if I'm an instance from one group I'm not not allowed to speak to an instance from another group -- close socket or ignore data or whatever").

But maybe there exist some tools for this, maybe some testing framework? In general, how do you test such cases in your distributed apps?


P.S. Though question is tagged with "java" (since ZooKeeper is written in Java), it'd be great to hear solutions for any other language, or language-independent ones -- maybe some linux guru-tricks.

回答1:

Maybe this answer will save a few hours of googling for someone.

Linux has a very good firewall utility, iptables. It allows you to block communication between processes, like this:

iptables -A INPUT -p tcp --sport <source port> --dport <dest port> -j DROP

While not a full-blown unit testing framework by any measure, this helps a bit with manual testing in my case.



回答2:

I would be tempted to say that this is integration testing rather than unit testing (or that it will be really hard to do proper unit testing of such thing).

The few times I needed to test such things, I've used virtualisation (e.g. VMWare) to test the system. You can test reboot, shutdown, etc. of nodes all from one physical machine running several images.



回答3:

In the past I have disconnected network cables. You can always disable a network interface via the OS.



回答4:

try blockade. https://github.com/dcm-oss/blockade

It's a docker based tool to emulate the behaviour you want. Runs on linux only, but there is a Vagrant file for setup if you want to run it from another OS.