No data nodes are started

2019-01-16 10:22发布

站内文章 / 后端开发

12 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am trying to setup Hadoop version 0.20.203.0 in a pseudo distributed configuration using the following guide:

http://www.javacodegeeks.com/2012/01/hadoop-modes-explained-standalone.html

After running the start-all.sh script I run "jps".

I get this output:

4825 NameNode
5391 TaskTracker
5242 JobTracker
5477 Jps
5140 SecondaryNameNode

When I try to add information to the hdfs using:

bin/hadoop fs -put conf input

I got an error:

hadoop@m1a2:~/software/hadoop$ bin/hadoop fs -put conf input
12/04/10 18:15:31 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

        at org.apache.hadoop.ipc.Client.call(Client.java:1030)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)

12/04/10 18:15:31 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
12/04/10 18:15:31 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/input/core-site.xml" - Aborting...
put: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
12/04/10 18:15:31 ERROR hdfs.DFSClient: Exception closing file /user/hadoop/input/core-site.xml : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/input/core-site.xml could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)

        at org.apache.hadoop.ipc.Client.call(Client.java:1030)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at $Proxy1.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)

I am not totally sure but I believe that this may have to do with the fact that the datanode is not running.

Does anybody know what I have done wrong, or how to fix this problem?

EDIT: This is the datanode.log file:

2012-04-11 12:27:28,977 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = m1a2/139.147.5.55
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
2012-04-11 12:27:29,166 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-04-11 12:27:29,181 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-04-11 12:27:29,183 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-04-11 12:27:29,342 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-04-11 12:27:29,347 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2012-04-11 12:27:29,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-hadoop/dfs/data: namenode namespaceID = 301052954; datanode namespaceID = 229562149
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:268)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)

2012-04-11 12:27:29,617 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at m1a2/139.147.5.55
************************************************************/

回答1:

That error you are getting in the DN log is described here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids

From that page:

At the moment, there seem to be two workarounds as described below.

Workaround 1: Start from scratch

I can testify that the following steps solve this error, but the side effects won’t make you happy (me neither). The crude workaround I have found is to:

Stop the cluster
Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml; if you followed this tutorial, the relevant directory is /app/hadoop/tmp/dfs/data
Reformat the NameNode (NOTE: all HDFS data is lost during this process!)
Restart the cluster

When deleting all the HDFS data and starting from scratch does not sound like a good idea (it might be ok during the initial setup/testing), you might give the second approach a try.

Workaround 2: Updating namespaceID of problematic DataNodes

Big thanks to Jared Stehler for the following suggestion. I have not tested it myself yet, but feel free to try it out and send me your feedback. This workaround is “minimally invasive” as you only have to edit one file on the problematic DataNodes:

Stop the DataNode
Edit the value of namespaceID in /current/VERSION to match the value of the current NameNode
Restart the DataNode

If you followed the instructions in my tutorials, the full path of the relevant files are:

NameNode: /app/hadoop/tmp/dfs/name/current/VERSION

DataNode: /app/hadoop/tmp/dfs/data/current/VERSION

(background: dfs.data.dir is by default set to

${hadoop.tmp.dir}/dfs/data, and we set hadoop.tmp.dir

in this tutorial to /app/hadoop/tmp).

If you wonder how the contents of VERSION look like, here’s one of mine:

# contents of /current/VERSION

namespaceID=393514426

storageID=DS-1706792599-10.10.10.1-50010-1204306713481

cTime=1215607609074

storageType=DATA_NODE

layoutVersion=-13

回答2:

Okay, I post this once more:

In case someone needs this, for newer version of Hadoop (basically I am running 2.4.0)

In this case stop the cluster sbin/stop-all.sh
Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

Delete both the directories recursively (rm -r).
Now format the namenode via bin/hadoop namenode -format
And finally sbin/start-all.sh

Hope this helps.

回答3:

I had the same issue on pseudo node using hadoop1.1.2 So I ran bin/stop-all.sh to stop the cluster then saw the configuration of my hadoop tmp directory in hdfs-site.xml

<name>hadoop.tmp.dir</name>
<value>/root/data/hdfstmp</value>

So I went into /root/data/hdfstmp and deleted all the files using command (you may loose ur data)

rm -rf *

and then format namenode again

bin/hadoop namenode -format

and then start the cluster using

bin/start-all.sh

Main reason is bin/hadoop namenode -format didn't remove the old data. So we have to delete it manually.

回答4:

Do following steps:

1. bin/stop-all.sh
2. remove dfs/ and mapred/ folder of hadoop.tmp.dir in core-site.xml
3. bin/hadoop namenode -format
4. bin/start-all.sh
5. jps

回答5:

Try formatting your datanode and restart it.

回答6:

I have been using CDH4 as my version of hadoop and have been having trouble configuring it. Even after trying to reformat my namenode I was still receiving the error.

My VERSION file was located in

/var/lib/hadoop-hdfs/cache/{username}/dfs/data/current/VERSION

You can find the location of the HDFS cache directory by looking for the hadoop.tmp.dir property:

more /etc/hadoop/conf/hdfs-site.xml

I found that by doing

cd /var/lib/hadoop-hdfs/cache/
rm -rf *

and then reformatting the namenode I was finally able to fix the issue. Thanks to the first reply for helping me to figure out what folder I needed to bomb.

回答7:

I tried with the approach 2 as suggested by Jared Stehler in the Chris Shain answer and i can confirm that after doing these changes , i was able to resolve the above mentioned problem.

I used the same version number for both the name and data VERSION file. Mean to say copied the version number from file VERSION inside (/app/hadoop/tmp/dfs/name/current) to VERSION inside (/app/hadoop/tmp/dfs/data/current) and it worked like charm

Cheers !

回答8:

I encountered this issue when using an unmodified cloudera quickstart vm 4.4.0-1

For reference, the cloudera manager said my datanode was in good health, even though the error message in the DataStreamer stacktrace said no datanodes were running.

credit goes to workaround #2 from https://stackoverflow.com/a/10110369/249538 but i'll detail my specific experience using the cloudera quickstart vm.

Specifically, I did:
in this order, stop the services hue1, hive1, mapreduce1, hdfs1 via the cloudera manager http://localhost.localdomain:7180/cmf/services/status

found my VERSION files via:
sudo find / -name VERSION

i got:

/dfs/dn/current/BP-780931682-127.0.0.1-1381159027878/current/VERSION
/dfs/dn/current/VERSION
/dfs/nn/current/VERSION
/dfs/snn/current/VERSION

i checked the contents of those files, but they all had a matching namespaceID except one file was just totally missing it. so i added an entry to it.

then i restarted the services in reverse order via the cloudera manager. now i can -put stuff onto hdfs.

回答9:

In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is

 <property>
 <name>dfs.name.dir</name>
 <value>/path/to/name</value>
 </property>

 <property>
 <name>dfs.data.dir</name>
 <value>/path/to/data</value>
 </property>

回答10:

I have the same problem with datanode missing and i follow this step that worked for me

1.find the folder that datanode located in. cd hadoop/hadoopdata/hdfs 2.look in the folder and you will see what file you have in hdfs ls 3.delete the datanode folder because it is old version of datanode rm -rf/datanode/* 4. you will get the new version after run the previous command 5. start new datanode hadoop-daemon.sh start datanode 6. refresh the web services. You will see the lost node appears my terminal