I have separate Docker (1.9.1) images for Hadoop (2.7.1) namenodes and datanodes. I can create containers from these and have them communicating over a user-defined Docker network. However, the datanode appears to report itself as having the IP address of the network gateway rather than its own IP address. Whilst this doesn't cause any issues with a single datanode, confusion reigns when additional datanodes are added. They all register with the same IP address and the namenode flips between them, only ever reporting that a single datanode is live.
Why does the server (namenode) read the wrong IP address from the client (datanode) socket connection when running over a user-defined Docker network and how can I fix it?
Update: This issue seems to be on the Docker side
Running two containers with --net=bridge
and executing a netcat server:
nc -v -l 9000
in one container and a netcat client in the other:
nc 172.17.0.2 9000
causes the first container to correctly print:
Connection from 172.17.0.3 port 9000 [tcp/9000] accepted
But creating a user-defined network:
sudo docker network create --driver bridge test
and executing the same commands in containers started with --net=test
incorrectly prints the IP address of the gateway/user-defined network interface:
Connection from 172.18.0.1 port 9000 [tcp/9000] accepted
HDFS/Docker Details
The dfs.datanode.address
property in each the datanode's hdfs-site.xml
file is set to its hostname (for example, hdfs-datanode-1
).
The network is created like this:
sudo docker network create --driver bridge hadoop-network
The namenode started like this:
sudo docker run -d \
--name hdfs-namenode \
-v /hdfs/name:/hdfs-name \
--net=hadoop-network \
--hostname hdfs-namenode \
-p 50070:50070 \
hadoop:namenode
And the datanode started like this:
sudo docker run -d \
--name hdfs-datanode-1 \
-v /hdfs/data_1:/hdfs-data \
--net=hadoop-network \
--hostname=hdfs-datanode-1 \
--restart=always \
hadoop:datanode
The two nodes connect fine and when queried (using sudo docker exec hdfs-namenode hdfs dfsadmin -report
) the connectivity is reported as:
... Live datanodes (1): Name: 172.18.0.1:50010 (172.18.0.1) Hostname: hdfs-datanode-1 ...
However, the output from running:
sudo docker exec hdfs-namenode cat /etc/hosts
Indicates that that namenode thinks that it's running on 172.18.0.2
and the datanode is running on 172.18.0.3
:
172.18.0.2 hdfs-namenode 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.18.0.3 hdfs-datanode-1 172.18.0.3 hdfs-datanode-1.hadoop-network
And the equivalent on the datanode shows the same:
172.18.0.3 hdfs-datanode-1 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 172.18.0.2 hdfs-namenode 172.18.0.2 hdfs-namenode.hadoop-network
Running ip route
on both confirms this:
sudo docker exec hdfs-namenode ip route
default via 172.18.0.1 dev eth0 172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2
sudo docker exec hdfs-datanode-1 ip route
default via 172.18.0.1 dev eth0 172.18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.3
And yet, when the datanode starts up, the namenode reports the datanode's IP address as 172.18.0.1
:
... INFO hdfs.StateChange: BLOCK* registerDatanode: from DatanodeRegistration(172.18.0.1:50010, datanodeUuid=3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-60401abd-4793-4acf-94dc-e8db02b27d59;nsid=1824008146;c=0) storage 3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3 ... INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0 ... INFO net.NetworkTopology: Adding a new node: /default-rack/172.18.0.1:50010 ... INFO blockmanagement.DatanodeDescriptor: Number of failed storage changes from 0 to 0 ... INFO blockmanagement.DatanodeDescriptor: Adding new storage ID DS-4ba1a710-a4ca-4cad-8222-cc5f16c213fb for DN 172.18.0.1:50010 ... INFO BlockStateChange: BLOCK* processReport: from storage DS-4ba1a710-a4ca-4cad-8222-cc5f16c213fb node DatanodeRegistration(172.18.0.1:50010, datanodeUuid=3abaf40c-4ce6-47e7-be2b-fbb4a7eba0e3, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-60401abd-4793-4acf-94dc-e8db02b27d59;nsid=1824008146;c=0), blocks: 1, hasStaleStorage: false, processing time: 3 msecs
And using tcpdump
to capture the traffic between the two (running in a Docker container attached to the host network - using docker run --net=host
) seems to show the error occurring (br-b59d498905c5
is the name of the network interface created by Docker for the hadoop-network
):
tcpdump -nnvvXS -s0 -i br-b59d498905c5 \
"(src host 172.18.0.3 or src host 172.18.0.2) and \
(dst host 172.18.0.3 or dst host 172.18.0.2)"
The IP address seems to be correctly sent within the registerDatanode
call:
... 172.18.0.3.33987 > 172.18.0.2.9000: ... ... 0x0050: f828 004d 0a10 7265 6769 7374 6572 4461 .(.M..registerDa 0x0060: 7461 6e6f 6465 1237 6f72 672e 6170 6163 tanode.7org.apac 0x0070: 6865 2e68 6164 6f6f 702e 6864 6673 2e73 he.hadoop.hdfs.s 0x0080: 6572 7665 722e 7072 6f74 6f63 6f6c 2e44 erver.protocol.D 0x0090: 6174 616e 6f64 6550 726f 746f 636f 6c18 atanodeProtocol. 0x00a0: 01a7 010a a401 0a51 0a0a 3137 322e 3138 .......Q..172.18 0x00b0: 2e30 2e33 120f 6864 6673 2d64 6174 616e .0.3..hdfs-datan 0x00c0: 6f64 652d 311a 2433 6162 6166 3430 632d ode-1.$3abaf40c- ...
But in subsequent calls it is incorrect. For example in the sendHeartbeat
call a fraction of a second afterwards:
... 172.18.0.3.33987 > 172.18.0.2.9000: ... ... 0x0050: f828 004a 0a0d 7365 6e64 4865 6172 7462 .(.J..sendHeartb 0x0060: 6561 7412 376f 7267 2e61 7061 6368 652e eat.7org.apache. 0x0070: 6861 646f 6f70 2e68 6466 732e 7365 7276 hadoop.hdfs.serv 0x0080: 6572 2e70 726f 746f 636f 6c2e 4461 7461 er.protocol.Data 0x0090: 6e6f 6465 5072 6f74 6f63 6f6c 1801 9d02 nodeProtocol.... 0x00a0: 0aa4 010a 510a 0a31 3732 2e31 382e 302e ....Q..172.18.0. 0x00b0: 3112 0f68 6466 732d 6461 7461 6e6f 6465 1..hdfs-datanode 0x00c0: 2d31 1a24 3361 6261 6634 3063 2d34 6365 -1.$3abaf40c-4ce ...
Debugging through the datanode code clearly shows the error occuring when the datanode registration details are updated in BPServiceActor.register()
based on the information returned by the namenode:
bpRegistration = bpNamenode.registerDatanode(bpRegistration);
Debugging the namenode shows that it reads the incorrect IP address from the datanode socket connection and updates the datanode registration details.
Additional Notes
I can reproduce the issue with this code running over a user-defined Docker network:
import java.net.ServerSocket;
import java.net.Socket;
public class Server {
public static void main(String[] args) throws Exception {
// 9000 is the namenode port
ServerSocket server = new ServerSocket(9000);
Socket socket = server.accept();
System.out.println(socket.getInetAddress().getHostAddress());
}
}
and
import java.net.Socket;
public class Client {
public static void main(String[] args) throws Exception {
// 172.18.0.2 is the namenode IP address
Socket socket = new Socket("172.18.0.2", 9000);
}
}
With both Server
and Client
running on 172.18.0.2
this correctly outputs 172.18.0.2
but with Client
running on 172.18.0.3
it incorrectly outputs 172.18.0.1
.
Running the same code without using a user-defined network (on the default bridge
network/docker0
interface and exposing port 9000
) gives the correct output.
I have the dfs.namenode.datanode.registration.ip-hostname-check
property set to false
in the namenode's hdfs-site.xml
file to prevent reverse DNS lookup errors. This might be unnecessary in future if I get DNS working but for now, with the wrong IP address being reported by the datanodes I doubt getting DNS working would help.
I believe the relevant wire protocols for registerDatanode
, sendHeartbeat
and blockReport
are RegisterDatanodeRequestProto
, HeartbeatRequestProto
and BlockReportRequestProto
and their definitions can be found here. These all contain DatanodeRegistrationProto
as their first data member. This message is defined in here and looks like this:
/**
* Identifies a Datanode
*/
message DatanodeIDProto {
required string ipAddr = 1; // IP address
required string hostName = 2; // hostname
...
}
This is caused by a known docker issue (I also raised - and closed - this duplicate which describes the steps as set out in the question).
There is a merged pull request which should fix the problem and is scheduled for inclusion in Docker 1.10.0. But in the meantime, the following workaround can be used:
sudo docker network rm
sudo service docker stop
sudo iptables -F && sudo iptables -F -t nat
sudo service docker start