Why is Dockerized Hadoop datanode registering with

I have separate Docker (1.9.1) images for Hadoop (2.7.1) namenodes and datanodes. I can create containers from these and have them communicating over a user-defined Docker network. However, the datanode appears to report itself as having the IP address of the network gateway rather than its own IP address. Whilst this doesn't cause any issues with a single datanode, confusion reigns when additional datanodes are added. They all register with the same IP address and the namenode flips between them, only ever reporting that a single datanode is live.

Why does the server (namenode) read the wrong IP address from the client (datanode) socket connection when running over a user-defined Docker network and how can I fix it?

Update: This issue seems to be on the Docker side

Running two containers with --net=bridge and executing a netcat server:

nc -v -l 9000

in one container and a netcat client in the other:

nc 172.17.0.2 9000

causes the first container to correctly print:

Connection from 172.17.0.3 port 9000 [tcp/9000] accepted

But creating a user-defined network:

sudo docker network create --driver bridge test

and executing the same commands in containers started with --net=test incorrectly prints the IP address of the gateway/user-defined network interface:

Connection from 172.18.0.1 port 9000 [tcp/9000] accepted

HDFS/Docker Details

The dfs.datanode.address property in each the datanode's hdfs-site.xml file is set to its hostname (for example, hdfs-datanode-1).

The network is created like this:

sudo docker network create --driver bridge hadoop-network

The namenode started like this:

sudo docker run -d \
                --name hdfs-namenode \
                -v /hdfs/name:/hdfs-name \
                --net=hadoop-network \
                --hostname hdfs-namenode \
                -p 50070:50070 \
                hadoop:namenode

And the datanode started like this:

sudo docker run -d \
                --name hdfs-datanode-1 \
                -v /hdfs/data_1:/hdfs-data \
                --net=hadoop-network \
                --hostname=hdfs-datanode-1 \
                --restart=always \
                hadoop:datanode

The two nodes connect fine and when queried (using sudo docker exec hdfs-namenode hdfs dfsadmin -report) the connectivity is reported as:

...
Live datanodes (1):

Name: 172.18.0.1:50010 (172.18.0.1)
Hostname: hdfs-datanode-1
...

However, the output from running:

 sudo docker exec hdfs-namenode cat /etc/hosts

Indicates that that namenode thinks that it's running on 172.18.0.2 and the datanode is running on 172.18.0.3:

172.18.0.2      hdfs-namenode
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.3      hdfs-datanode-1
172.18.0.3      hdfs-datanode-1.hadoop-network

And the equivalent on the datanode shows the same:

172.18.0.3      hdfs-datanode-1
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
172.18.0.2      hdfs-namenode
172.18.0.2      hdfs-namenode.hadoop-network

Running ip route on both confirms this:

sudo docker exec hdfs-namenode ip route

default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0  proto kernel  scope link  src 172.18.0.2

sudo docker exec hdfs-datanode-1 ip route

default via 172.18.0.1 dev eth0
172.18.0.0/16 dev eth0  proto kernel  scope link  src 172.18.0.3

And yet, when the datanode starts up, the namenode reports the datanode's IP address as 172.18.0.1:

... INFO hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(172.18.0.1:50010, datanodeUuid=3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-60401abd-4793-4acf-94dc-e8db02b27d59;nsid=1824008146;c=0) storage 3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3
... INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
... INFO net.NetworkTopology: Adding a new node: /default-rack/172.18.0.1:50010
... INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0
... INFO blockmanagement.DatanodeDescriptor: Adding new storage ID DS-4ba1a710-a4ca-4cad-8222-cc5f16c213fb for DN 172.18.0.1:50010
... INFO BlockStateChange: BLOCK* processReport: from storage DS-4ba1a710-a4ca-4cad-8222-cc5f16c213fb node DatanodeRegistration(172.18.0.1:50010, datanodeUuid=3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-60401abd-4793-4acf-94dc-e8db02b27d59;nsid=1824008146;c=0), blocks: 1, hasStaleStorage: false, processing time: 3 msecs

And using tcpdump to capture the traffic between the two (running in a Docker container attached to the host network - using docker run --net=host) seems to show the error occurring (br-b59d498905c5 is the name of the network interface created by Docker for the hadoop-network):

tcpdump -nnvvXS -s0 -i br-b59d498905c5 \
        "(src host 172.18.0.3 or src host 172.18.0.2) and \
         (dst host 172.18.0.3 or dst host 172.18.0.2)"

The IP address seems to be correctly sent within the registerDatanode call:

...
172.18.0.3.33987 > 172.18.0.2.9000: ...
    ...
    0x0050:  f828 004d 0a10 7265 6769 7374 6572 4461  .(.M..registerDa
    0x0060:  7461 6e6f 6465 1237 6f72 672e 6170 6163  tanode.7org.apac
    0x0070:  6865 2e68 6164 6f6f 702e 6864 6673 2e73  he.hadoop.hdfs.s
    0x0080:  6572 7665 722e 7072 6f74 6f63 6f6c 2e44  erver.protocol.D
    0x0090:  6174 616e 6f64 6550 726f 746f 636f 6c18  atanodeProtocol.
    0x00a0:  01a7 010a a401 0a51 0a0a 3137 322e 3138  .......Q..172.18
    0x00b0:  2e30 2e33 120f 6864 6673 2d64 6174 616e  .0.3..hdfs-datan
    0x00c0:  6f64 652d 311a 2433 6162 6166 3430 632d  ode-1.$3abaf40c-
    ...

But in subsequent calls it is incorrect. For example in the sendHeartbeat call a fraction of a second afterwards:

...
172.18.0.3.33987 > 172.18.0.2.9000: ...
    ...
    0x0050:  f828 004a 0a0d 7365 6e64 4865 6172 7462  .(.J..sendHeartb
    0x0060:  6561 7412 376f 7267 2e61 7061 6368 652e  eat.7org.apache.
    0x0070:  6861 646f 6f70 2e68 6466 732e 7365 7276  hadoop.hdfs.serv
    0x0080:  6572 2e70 726f 746f 636f 6c2e 4461 7461  er.protocol.Data
    0x0090:  6e6f 6465 5072 6f74 6f63 6f6c 1801 9d02  nodeProtocol....
    0x00a0:  0aa4 010a 510a 0a31 3732 2e31 382e 302e  ....Q..172.18.0.
    0x00b0:  3112 0f68 6466 732d 6461 7461 6e6f 6465  1..hdfs-datanode
    0x00c0:  2d31 1a24 3361 6261 6634 3063 2d34 6365  -1.$3abaf40c-4ce
    ...

Debugging through the datanode code clearly shows the error occuring when the datanode registration details are updated in BPServiceActor.register() based on the information returned by the namenode:

bpRegistration = bpNamenode.registerDatanode(bpRegistration);

Debugging the namenode shows that it reads the incorrect IP address from the datanode socket connection and updates the datanode registration details.

Additional Notes

I can reproduce the issue with this code running over a user-defined Docker network:

import java.net.ServerSocket;
import java.net.Socket;

public class Server {
    public static void main(String[] args) throws Exception {
        // 9000 is the namenode port
        ServerSocket server = new ServerSocket(9000);

        Socket socket = server.accept();
        System.out.println(socket.getInetAddress().getHostAddress());
    }
}

and

import java.net.Socket;

public class Client {
    public static void main(String[] args) throws Exception {
        // 172.18.0.2 is the namenode IP address
        Socket socket = new Socket("172.18.0.2", 9000);
    }
}

With both Server and Client running on 172.18.0.2 this correctly outputs 172.18.0.2 but with Client running on 172.18.0.3 it incorrectly outputs 172.18.0.1.

Running the same code without using a user-defined network (on the default bridge network/docker0 interface and exposing port 9000) gives the correct output.

I have the dfs.namenode.datanode.registration.ip-hostname-check property set to false in the namenode's hdfs-site.xml file to prevent reverse DNS lookup errors. This might be unnecessary in future if I get DNS working but for now, with the wrong IP address being reported by the datanodes I doubt getting DNS working would help.

I believe the relevant wire protocols for registerDatanode, sendHeartbeat and blockReport are RegisterDatanodeRequestProto, HeartbeatRequestProto and BlockReportRequestProto and their definitions can be found here. These all contain DatanodeRegistrationProto as their first data member. This message is defined in here and looks like this:

/**
 * Identifies a Datanode
 */
message DatanodeIDProto {
  required string ipAddr = 1;    // IP address
  required string hostName = 2;  // hostname
  ...
}

This is caused by a known docker issue (I also raised - and closed - this duplicate which describes the steps as set out in the question).

There is a merged pull request which should fix the problem and is scheduled for inclusion in Docker 1.10.0. But in the meantime, the following workaround can be used: