could only be replicated to 0 nodes instead of min

I don't know how to fix this error:

Vertex failed, vertexName=initialmap, vertexId=vertex_1449805139484_0001_1_00, diagnostics=[Task failed, taskId=task_1449805139484_0001_1_00_000003, diagnostics=[AttemptID:attempt_1449805139484_0001_1_00_000003_0 Info:Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/hadoop/gridmix-kon/input/_temporary/1/_temporary/attempt_14498051394840_0001_m_000003_0/part-m-00003/segment-121 could only be replicated to 0 nodes instead of minReplication (=1). There are 4 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1441)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy17.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)

Any idea what's the case?

标签： hadoop hdfs yarn hadoop2 apache-tez

1条回答

小情绪 Triste *

2楼-- · 2019-01-25 13:55

This error occurs in BlockManager::chooseTarget4NewBlock() (I am referring to the latest code) code. Specific piece of code, which causes this is:

final DatanodeStorageInfo[] targets = blockplacement.chooseTarget(src,
    numOfReplicas, client, excludedNodes, blocksize, 
    favoredDatanodeDescriptors, storagePolicy);

if (targets.length < minReplication) {
  throw new IOException("File " + src + " could only be replicated to "
      + targets.length + " nodes instead of minReplication (="
      + minReplication + ").  There are "
      + getDatanodeManager().getNetworkTopology().getNumOfLeaves()
      + " datanode(s) running and "
      + (excludedNodes == null? "no": excludedNodes.size())
      + " node(s) are excluded in this operation.");
}

This occurs, when the BlockManager tries to choose a target host for storing new block of data and can not find a single host (targets.length < minReplication). minReplication is set to 1 (configuration parameter: dfs.namenode.replication.min) in hdfs-site.xml file.

This could occur due to one of the following reasons:

Data Node instances are not running
Data Node instances are unable to contact the Name Node
Data Nodes have run out of space, hence no new block of data can be allocated to them

But, in your case, error message also contains following information:

There are 4 datanode(s) running and no node(s) are excluded in this operation.

It means, there are 4 Data Nodes running and all the 4 Data Nodes were considered for placement of data, for this operation.

So, possible suspect is disk space on the Data Nodes. You can check the disk space on your Data Nodes, using the following command:

hdfs dfsadmin -report

It gives report for each of your Live Data Nodes. For e.g. in my case, I got the following:

Live datanodes (1):

Name: 192.168.56.1:50010 (192.168.56.1)
Hostname: 192.168.56.1
Decommission Status : Normal
Configured Capacity: 648690003968 (604.14 GB)
DFS Used: 193849055737 (180.54 GB)
Non DFS Used: 186164975111 (173.38 GB)
DFS Remaining: 268675973120 (250.22 GB)
DFS Used%: 29.88%
DFS Remaining%: 41.42%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun Dec 13 17:17:34 IST 2015

Check the "DFS-Remaining" and "DFS-Remaining%". That should give you an idea about the remaining space on your Data Nodes.

You can also refer to the wiki here: https://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo, which describes the reasons for this error and ways to mitigate it.

0人赞添加讨论(0) 举报

could only be replicated to 0 nodes instead of min

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间