I have a container running hadoop. I have another docker file which contains Map-Reduce job commands like creating input directory, processing a default example, displaying output. Base image for the second file is hadoop_image created from first docker file.
EDIT
Dockerfile - for hadoop
#base image is ubuntu:precise
#cdh installation
#hadoop-0.20-conf-pseudo installation
#CMD to start-all.sh
start-all.sh
#start all the services under /etc/init.d/hadoop-*
hadoop base image created from this.
Dockerfile2
#base image is hadoop
#flume-ng and flume-ng agent installation
#conf change
#flume-start.sh
flume-start.sh
#start flume services
I am running both containers separately. It works fine. But if i run
docker run -it flume_service
it starts flume and show me a bash prompt [/bin/bash is the last line of flume-start.sh]. The i execute
hadoop fs -ls /
in the second running container, i am getting the following error
ls: Call From 514fa776649a/172.17.5.188 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I understand i am getting this error because hadoop services are not started yet. But my doubt is my first container is running. I am using this as base image for second container. Then why am i getting this error? Do i need to change anything in hdfs-site.xml file on flume contianer?
Pseudo-Distributed mode installation.
Any suggestions?
Or Do i need to expose any ports and like so? If so, please provide me an example
EDIT 2
iptables -t nat -L -n
I see
sudo iptables -t nat -L -n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE tcp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE udp -- 192.168.122.0/24 !192.168.122.0/24 masq ports: 1024-6
MASQUERADE all -- 192.168.122.0/24 !192.168.122.0/24
MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-
Chain DOCKER (2 references)
target prot opt source destination
It is in docker@domian. Not inside a container.
EDIT See last comment under surazj' answer
I think I met the same problem yet. I either can't start hadoop namenode and datanode by hadoop command "start-all.sh" in docker1.
That is because it launch namenode and datanode through "hadoop-daemons.sh" but it failed. The real problem is "ssh" is not work in docker.
So, you can do either
(solution 1) :
Replace all terms "daemons.sh" to "daemon.sh" in start-dfs.sh, than run start-dfs.sh
(solution 2) : do
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode $HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
You can see datanode and namenode are working fine by command "jps"
Regards.
Have you tried linking the container?
For example, your container named hadoop is running in psedo dist mode. You want to bring up another container that contains flume. You could link the container like
when you get inside the flume container - type env command to see ip and port exposed by hadoop container.
From the flume container you should be able to do something like. (ports on hadoop container should be exposed)
The error you are getting might be related to some hadoop services not running on flume. do jps to check services running. But I think if you have hadoop classpath setup correctly on flume container, then you can run the above hdfs command (-ls hdfs://:8020/) without starting anything. But if you want
to work on flume container, then you need to start hadoop services on flume container also.
On your core-site.xml add dfs.namenode.rpc-address like this so namenode listens to connection from all ip
Make sure to restart the namenode and datanode
sudo /etc/init.d/hadoop-hdfs-namenode restart && sudo /etc/init.d/hadoop-hdfs-datanode restart
Then you should be able to do this from your hadoop container without connection error, eg
On the linked container. Type env to see exposed ports by your hadoop container
You should see something like HADOOP_PORT_8020_TCP=tcp://172.17.0.11:8020
Then you can verify the connection from your linked container.