HDFS error: could only be replicated to 0 nodes, i

2020-01-26 03:41发布

I've created a ubuntu single node hadoop cluster in EC2.

Testing a simple file upload to hdfs works from the EC2 machine, but doesn't work from a machine outside of EC2.

I can browse the the filesystem through the web interface from the remote machine, and it shows one datanode which is reported as in service. Have opened all tcp ports in the security from 0 to 60000(!) so I don't think it's that.

I get the error

java.io.IOException: File /user/ubuntu/pies could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1448)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:690)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)

at org.apache.hadoop.ipc.Client.call(Client.java:905)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:928)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:811)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:427)

namenode log just gives the same error. Others don't seem to have anything interesting

Any ideas?

Cheers

16条回答
相关推荐>>
2楼-- · 2020-01-26 03:53

Look at following:

By seeing this exception(could only be replicated to 0 nodes, instead of 1), datanode is not available to Name Node..

This are the following cases Data Node may not available to Name Node

  1. Data Node disk is Full

  2. Data Node is Busy with block report and block scanning

  3. If Block Size is Negative value(dfs.block.size in hdfs-site.xml)

  4. while write in progress primary datanode goes down(Any n/w fluctations b/w Name Node and Data Node Machines)

  5. when Ever we append any partial chunk and call sync for subsequent partial chunk appends client should store the previous data in buffer.

For example after appending "a" I have called sync and when I am trying the to append the buffer should have "ab"

And Server side when the chunk is not multiple of 512 then it will try to do Crc comparison for the data present in block file as well as crc present in metafile. But while constructing crc for the data present in block it is always comparing till the initial Offeset Or For more analysis Please the data node logs

Reference: http://www.mail-archive.com/hdfs-user@hadoop.apache.org/msg01374.html

查看更多
Bombasti
3楼-- · 2020-01-26 03:53

And I think you should make sure all the datanodes are up when you do copy to dfs. In some case, it takes a while. I think that's why the solution 'checking the health status' works, because you go to the health status webpage and wait for everything up, my five cents.

查看更多
虎瘦雄心在
4楼-- · 2020-01-26 03:56

This is your issue - the client can't communicate with the Datanode. Because the IP that the client received for the Datanode is an internal IP and not the public IP. Take a look at this

http://www.hadoopinrealworld.com/could-only-be-replicated-to-0-nodes/

Look at the sourcecode from DFSClient$DFSOutputStrem (Hadoop 1.2.1)

//
// Connect to first DataNode in the list.
//
success = createBlockOutputStream(nodes, clientName, false);

if (!success) {
  LOG.info("Abandoning " + block);
  namenode.abandonBlock(block, src, clientName);

  if (errorIndex < nodes.length) {
    LOG.info("Excluding datanode " + nodes[errorIndex]);
    excludedNodes.add(nodes[errorIndex]);
  }

  // Connection failed. Let's wait a little bit and retry
  retry = true;
}

The key to understand here is that Namenode only provide the list of Datanodes to store the blocks. Namenode does not write the data to the Datanodes. It is the job of the Client to write the data to the Datanodes using the DFSOutputStream . Before any write can begin the above code make sure that the Client can communicate with the Datanode(s) and if the communication fails to the Datanode, the Datanode is added to the excludedNodes .

查看更多
▲ chillily
5楼-- · 2020-01-26 03:56

I have also had the same problem/ error. The problem occurred in the first place when I formatted using hadoop namenode -format

So after re - starting hadoop using, start-all.sh, the data node did not start or initialize. You can check this using jps, there should be five entries. If datanode is missing, then you can do this:

Datanode process not running in Hadoop

Hope this helps.

查看更多
Bombasti
6楼-- · 2020-01-26 03:57

I had a similar problem setting up a single node cluster. I realized that I didn't config any datanode. I added my hostname to conf/slaves, then it worked out. Hope it helps.

查看更多
手持菜刀,她持情操
7楼-- · 2020-01-26 03:59

I had the same error on MacOS X 10.7 (hadoop-0.20.2-cdh3u0) due to data node not starting.
start-all.sh produced following output:

starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
localhost: ssh: connect to host localhost port 22: Connection refused

After enabling ssh login via System Preferences -> Sharing -> Remote Login it started to work.
start-all.sh output changed to following (note start of datanode):

starting namenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting datanode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting secondarynamenode, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
starting jobtracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
Password:
localhost: starting tasktracker, logging to /java/hadoop-0.20.2-cdh3u0/logs/...
查看更多
登录 后发表回答