I've created a ubuntu single node hadoop cluster in EC2.
Testing a simple file upload to hdfs works from the EC2 machine, but doesn't work from a machine outside of EC2.
I can browse the the filesystem through the web interface from the remote machine, and it shows one datanode which is reported as in service. Have opened all tcp ports in the security from 0 to 60000(!) so I don't think it's that.
I get the error
java.io.IOException: File /user/ubuntu/pies could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)
at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:928)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:811)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)
namenode log just gives the same error. Others don't seem to have anything interesting
Any ideas?
Cheers
Don't format the name node immediately. Try stop-all.sh and start it using start-all.sh. If the problem persists, go for formatting the name node.
I realize I'm a little late to the party, but I wanted to post this for future visitors of this page. I was having a very similar problem when I was copying files from local to hdfs and reformatting the namenode did not fix the problem for me. It turned out that my namenode logs had the following error message:
Apparently, this is a relatively common problem on hadoop clusters and Cloudera suggests increasing the nofile and epoll limits (if on kernel 2.6.27) to work around it. The tricky thing is that setting nofile and epoll limits is highly system dependent. My Ubuntu 10.04 server required a slightly different configuration for this to work properly, so you may need to alter your approach accordingly.
Follow the below steps:
1. Stop
dfs
andyarn
.2. Remove datanode and namenode directories as specified in the
core-site.xml
.3. Start
dfs
andyarn
as follows:WARNING: The following will destroy ALL data on HDFS. Do not execute the steps in this answer unless you do not care about destroying existing data!!
You should do this:
hdfs namenode -format
Answer with a capital YAlso, check the diskspace in your system and make sure the logs are not warning you about it.
I'll try to describe my setup & solution: My setup: RHEL 7, hadoop-2.7.3
I tried to setup standalone Operation first and then Pseudo-Distributed Operation where the latter failed with the same issue.
Although, when I start hadoop with:
I got the following:
which looks promising (starting datanode.. with no failures) - but the datanode wasn't exist indeed.
Another indication was to see that there is no datanode in operation (the below snapshot shows fixed working state):
I've fix that issue by doing:
and then start again:
It take me a week to figure out the problem in my situation.
When the client(your program) ask the nameNode for data operation, the nameNode picks up a dataNode and navigate the client to it, by giving the dataNode's ip to the client.
But, when the dataNode host is configured to has multiple ip, and the nameNode gives you the one your client CAN'T ACCESS TO, the client would add the dataNode to exclude list and ask the nameNode for a new one, and finally all dataNode are excluded, you get this error.
So check node's ip settings before you try everything!!!