Akka.net remoting over Docker containers: client r

2019-08-13 15:26发布

问题:

There is a simple host with a TestActor that only writes a string it receives to the console:

using (var actorSystem = ActorSystem.Create("host", HoconLoader.FromFile("config.hocon")))
{
    var testActor = actorSystem.ActorOf(Props.Create<TestActor>(), "TestActor");

    Console.WriteLine($"Waiting for requests...");

    while (true)
    {
        Task.Delay(1000).Wait();
    }
}

On the other side there is a simple client that selects the remote actor and passes a TestMessage to it, then waits on an ask without a timeout specified.

using (var actorSystem = ActorSystem.Create("client", HoconLoader.FromFile("config.hocon")))
{
    var testActor = actorSystem.ActorSelection("akka.tcp://host@host:8081/user/TestActor");

    Console.WriteLine($"Sending message...");

    testActor.Ask(new TestMessage($"Message")).Wait();

    Console.WriteLine($"Message ACKed.");
}

The client and the host are deployed on two Docker containers (docker-compose), whose network configuration is as follows (docker network inspect ...):

[
    {
        "Name": "akkaremotetest_default",
        "Id": "4995d7e340e09e4babcca7dc02ddf4f68f70761746c1246d66eaf7ee40ccec89",
        "Created": "2018-07-21T07:55:39.3534215Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "6040c260c5195d2fe350bf3c89b5f9ede8a65d44da6adb48817fbef266a99e07": {
                "Name": "akkaremotetest_host_1",
                "EndpointID": "a6220a6fee071a29b83e30f9aeb9b9e7ec5008f04f593ff3fb2464477a7e54aa",
                "MacAddress": "02:42:ac:13:00:02",
                "IPv4Address": "172.19.0.2/16",
                "IPv6Address": ""
            },
            "a97078c28c7d221c2c9af948fe36b72590251be69e06d0e66eafd2c74f416037": {
                "Name": "akkaremotetest_client_1",
                "EndpointID": "39bcb8b1047ad666d9c568ee968602b3a93edb4ac2151ba9c3f3c02359ef84f2",
                "MacAddress": "02:42:ac:13:00:03",
                "IPv4Address": "172.19.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

When the containers are started, the result is one of the following:

  • the client succeeds with the Ask, the actor writes received message to the console, and the client confirms success,
  • the client hangs forever, the actor never receives the message, timeout does not occur.

The problem is that the latter happens most of the time, but only when the host and the client are deployed on Docker containers. When run independently, there are no communication issues.

I think I tried everything without results, and I don't know what else I could do to investigate why the Ask of the client lasts forever, with no errors logged by any of these two actor systems.

Here is the Docker configuration (yml):

version: '2'

services:

  host:
    ports:
      - 8081:8081
    build:
      context: .
      dockerfile: Dockerfile
      args:
        PROJECT_DIR: Host
        PROJECT_NAME: Host
        WAIT_FOR_HOST: 0
    restart: on-failure

  client:
    depends_on:
      - host
    ports:
      - 8082:8082
    build:
      context: .
      dockerfile: Dockerfile
      args:
        PROJECT_DIR: Client
        PROJECT_NAME: Client
        WAIT_FOR_HOST: 1
    restart: on-failure

  tcpdump:
    image: kaazing/tcpdump
    network_mode: "host"
    volumes:
      - ./tcpdump:/tcpdump

Here is the configuration of the client system (config.hocon):

akka {     
    actor {
        provider = remote
    }

    remote {
        dot-netty.tcp {
            enabled-transports = ["akka.remote.netty.tcp"]
            hostname = client
            port = 8082
        }
    }

    stdout-loglevel = DEBUG
    loglevel = DEBUG
    log-config-on-start = on        

    actor {      
        creation-timeout = 20s  
        debug {  
              receive = on 
              autoreceive = on
              lifecycle = on
              event-stream = on
              unhandled = on
              fsm = on
              event-stream = on
              log-sent-messages = on
              log-received-messages = on
              router-misconfiguration = on
        }
    }
}

Here is the configuration of the host system (config.hocon):

akka {     
    actor {
        provider = remote
    }

    remote {
        dot-netty.tcp {
            enabled-transports = ["akka.remote.netty.tcp"]
            hostname = host
            port = 8081
        }
    }

    stdout-loglevel = DEBUG
    loglevel = DEBUG
    log-config-on-start = on        

    actor {        
        creation-timeout = 20s  
        debug {  
              receive = on 
              autoreceive = on
              lifecycle = on
              event-stream = on
              unhandled = on
              fsm = on
              event-stream = on
              log-sent-messages = on
              log-received-messages = on
              router-misconfiguration = on
        }
    }
}

Following the documentation concerning Akka remote configuration, I attempted to change the client configuration like this:

remote {
    dot-netty.tcp {
        enabled-transports = ["akka.remote.netty.tcp"]

        hostname = 172.19.0.3
        port = 8082

        bind-hostname = client
        bind-port = 8082 
    }
}

and the host configuration by analogy:

remote {
    dot-netty.tcp {
        enabled-transports = ["akka.remote.netty.tcp"]

        hostname = 172.19.0.2
        port = 8081

        bind-hostname = host
        bind-port = 8081 
    }
}

with a slight change in actor selection as well:

var testActor = actorSystem.ActorSelection("akka.tcp://host@172.19.0.2:8081/user/TestActor");

Unfortunately this has not helped at all (nothing has changed).

In the logs that are generated during the process, there is a crucial entry that is generated by the host system. Only when it appears, the communication is successful (but most often it does not):

[DEBUG][07/21/2018 09:42:50][Thread 0006][remoting] Associated [akka.tcp://host@host:8081] <- akka.tcp://client@client:8082

Any help will be appreciated. Thank you!

-- EDIT --

I added the tcpdump section to yml and opened the generated dump file in Wireshark. I also added a 5-second timeout to waiting on ask. It is hard for me to interpret the results, but here is what I got on a failed connection attempt:

172.19.0.3 -> 172.19.0.2: SYN
172.19.0.2 -> 172.19.0.3: SYN, ACK

172.19.0.3 -> 172.19.0.2: ACK

[a 5-second period of silence (waiting till timeout)]

172.19.0.3 -> 172.19.0.2: FIN, ACK

172.19.0.2 -> 172.19.0.3: ACK
172.19.0.2 -> 172.19.0.3: FIN, ACK

172.19.0.3 -> 172.19.0.2: ACK

and here is what happens when connection succeeds:

172.19.0.3 -> 172.19.0.2: SYN
172.19.0.2 -> 172.19.0.3: SYN, ACK

172.19.0.3 -> 172.19.0.2: ACK
172.19.0.3 -> 172.19.0.2: PSH, ACK

172.19.0.2 -> 172.19.0.3: ACK
172.19.0.2 -> 172.19.0.3: PSH, ACK

172.19.0.3 -> 172.19.0.2: ACK
172.19.0.3 -> 172.19.0.2: PSH, ACK

Versions:

  • Akka.NET 1.3.8
  • .NET Core 2.1.1
  • Docker 18.03.1-ce, build 9ee9f40
  • Docker-compose 1.21.1, build 7641a569

回答1:

It turns out that the issue stems from the fact that the projects are dependent on .NET Core 2.1 which Akka does not support yet according to this:

We don't officially support .NET Core 2.1 yet. Heck, we aren't even on netstandard 2.0 yet (although work is underway). But thanks for confirming that there are indeed issues :)

After switching to .NET Core 2.0, I can no longer reproduce described issue.