Datanode process not running in Hadoop

2020-02-02 04:05发布

问题:

I set up and configured a multi-node Hadoop cluster using this tutorial.

When I type in the start-all.sh command, it shows all the processes initializing properly as follows:

starting namenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-namenode-jawwadtest1.out
jawwadtest1: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest1.out
jawwadtest2: starting datanode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-datanode-jawwadtest2.out
jawwadtest1: starting secondarynamenode, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-secondarynamenode-jawwadtest1.out
starting jobtracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-jobtracker-jawwadtest1.out
jawwadtest1: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest1.out
jawwadtest2: starting tasktracker, logging to /usr/local/hadoop/libexec/../logs/hadoop-root-tasktracker-jawwadtest2.out

However, when I type the jps command, I get the following output:

31057 NameNode
4001 RunJar
6182 RunJar
31328 SecondaryNameNode
31411 JobTracker
32119 Jps
31560 TaskTracker

As you can see, there's no datanode process running. I tried configuring a single-node cluster but got the same problem. Would anyone have any idea what could be going wrong here? Are there any configuration files that are not mentioned in the tutorial or I may have looked over? I am new to Hadoop and am kinda lost and any help would be greatly appreciated.

EDIT: hadoop-root-datanode-jawwadtest1.log:

STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/$
************************************************************/
2012-08-09 23:07:30,717 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loa$
2012-08-09 23:07:30,734 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:30,735 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:30,736 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:31,018 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapt$
2012-08-09 23:07:31,024 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl:$
2012-08-09 23:07:32,366 INFO org.apache.hadoop.ipc.Client: Retrying connect to $
2012-08-09 23:07:37,949 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: $
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(Data$
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransition$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNo$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNod$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode($
        at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataN$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.$
        at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1$

2012-08-09 23:07:37,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: S$
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at jawwadtest1/198.101.220.90
************************************************************/

回答1:

You need to do something like this:

  • bin/stop-all.sh (or stop-dfs.sh and stop-yarn.sh in the 2.x serie)
  • rm -Rf /app/tmp/hadoop-your-username/*
  • bin/hadoop namenode -format (or hdfs in the 2.x serie)

the solution was taken from: http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/. Basically it consists in restarting from scratch, so make sure you won't loose data by formating the hdfs.



回答2:

I ran into the same issue. I have created a hdfs folder '/home/username/hdfs' with sub-directories name, data, and tmp which were referenced in config xml files of hadoop/conf.

When I started hadoop and did jps, I couldn't find datanode so I tried to manually start datanode using bin/hadoop datanode. Then I realized from error message that it has permissions issue accessing the dfs.data.dir=/home/username/hdfs/data/ which was referenced in one of the hadoop config files. All I had to do was stop hadoop, delete the contents of /home/username/hdfs/tmp/* directory and then try this command - chmod -R 755 /home/username/hdfs/ and then start hadoop. I could find the datanode!



回答3:

I faced similar issue while running the datanode. The following steps were useful.

  1. In [hadoop_directory]/sbin directory use ./stop-all.sh to stop all the running services.
  2. Remove the tmp dir using rm -r [hadoop_directory]/tmp (The path configured in [hadoop_directory]/etc/hadoop/core-site.xml)
  3. sudo mkdir [hadoop_directory]/tmp (Make a new tmp directory)
  4. Go to */hadoop_store/hdfs directory where you have created namenode and datanode as sub-directories. (The paths configured in [hadoop_directory]/etc/hadoop/hdfs-site.xml). Use

    rm -r namenode
    
    rm -r datanode
    
  5. In */hadoop_store/hdfs directory use

    sudo mkdir namenode
    
    sudo mkdir datanode
    

In case of permission issue, use

   chmod -R 755 namenode 

   chmod -R 755 datanode
  1. In [hadoop_directory]/bin use

     hadoop namenode -format (To format your namenode)
    
  2. In [hadoop_directory]/sbin directory use ./start-all.sh or ./start-dfs.sh to start the services.
  3. Use jps to check the services running.


回答4:

I was having the same problem running a single-node pseudo-distributed instance. Couldn't figure out how to solve it, but a quick workaround is to manually start a DataNode with
hadoop-x.x.x/bin/hadoop datanode



回答5:

  1. Stop the dfs and yarn first.
  2. Remove the datanode and namenode directories as specified in the core-site.xml file.
  3. Re-create the directories.
  4. Then re-start the dfs and the yarn as follows.

    start-dfs.sh

    start-yarn.sh

    mr-jobhistory-daemon.sh start historyserver

    Hope this works fine.



回答6:

Please control if the the tmp directory property is pointing to a valid directory in core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

If the directory is misconfigured, the datanode process will not start properly.



回答7:

Run Below Commands in Line:-

  1. stop-all.sh (Run Stop All to Stop all the hadoop process)
  2. rm -r /usr/local/hadoop/tmp/ (Your Hadoop tmp directory which you configured in hadoop/conf/core-site.xml)
  3. sudo mkdir /usr/local/hadoop/tmp (Make the same directory again)
  4. hadoop namenode -format (Format your namenode)
  5. start-all.sh (Run Start All to start all the hadoop process)
  6. JPS (It will show the running processes)


回答8:

Follow these steps and your datanode will start again.

  1. Stop dfs.
  2. Open hdfs-site.xml
  3. Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.
  4. Then remove the hadoopdata directory and add the data.dir and name.dir in hdfs-site.xml and again format namenode.
  5. Then start dfs again.


回答9:

Stop all the services - ./stop-all.sh Format all the hdfs tmp directory from all the master and slave. Don't forget to format from slave.

Format the namenode.(hadoop namenode -format)

Now start the services on namenode. ./bin/start-all.sh

This made a difference for me to start the datanode service.



回答10:

Step 1:- Stop-all.sh

Step 2:- got to this path

cd /usr/local/hadoop/bin

Step 3:- Run that command hadoop datanode

Now DataNode work



回答11:

Need to follow 3 steps.

(1) Need to go to the logs and check the most recent log (In hadoop- 2.6.0/logs/hadoop-user-datanode-ubuntu.log)

If the error is as

java.io.IOException: Incompatible clusterIDs in /home/kutty/work/hadoop2data/dfs/data: namenode clusterID = CID-c41df580-e197-4db6-a02a-a62b71463089; datanode clusterID = CID-a5f4ba24-3a56-4125-9137-fa77c5bb07b1

i.e. namenode cluster id and datanode cluster id's are not identical.

(2) Now copy the namenode clusterID which is CID-c41df580-e197-4db6-a02a-a62b71463089 in above error

(3) Replace the Datanode cluster ID with Namenode cluster ID in hadoopdata/dfs/data/current/version

clusterID=CID-c41df580-e197-4db6-a02a-a62b71463089

Restart Hadoop. Will run DataNode



回答12:

Check whether the hadoop.tmp.dir property in the core-site.xml is correctly set. If you set it, navigate to this directory, and remove or empty this directory. If you didn't set it, you navigate to its default folder /tmp/hadoop-${user.name}, likewise remove or empty this directory.



回答13:

I have got details of the issue in the log file like below : "Invalid directory in dfs.data.dir: Incorrect permission for /home/hdfs/dnman1, expected: rwxr-xr-x, while actual: rwxrwxr-x" and from there I identified that the datanote file permission was 777 for my folder. I corrected to 755 and it started working.



回答14:

Instead of deleting everything under the "hadoop tmp dir", you can set another one. For example, if your core-site.xml has this property:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp</value>
</property>

You can change this to:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/data/tmp2</value>
</property>

and then scp core-site.xml to each node, and then "hadoop namenode -format", and then restart hadoop.



回答15:

This is for newer version of Hadoop (I am running 2.4.0)

  • In this case stop the cluster sbin/stop-all.sh
  • Then go to /etc/hadoop for config files.

In the file: hdfs-site.xml Look out for directory paths corresponding to dfs.namenode.name.dir dfs.namenode.data.dir

  • Delete both the directories recursively (rm -r).
  • Now format the namenode via bin/hadoop namenode -format
  • And finally sbin/start-all.sh

Hope this helps.



回答16:

You need to check :

/app/hadoop/tmp/dfs/data/current/VERSION and /app/hadoop/tmp/dfs/name/current/VERSION ---

in those two files and that to Namespace ID of name node and datanode.

If and only if data node's NamespaceID is same as name node's NamespaceID then your datanode will run.

If those are different copy the namenode NamespaceID to your Datanode's NamespaceID using vi editor or gedit and save and re run the deamons it will work perfectly.



回答17:

if formatting the tmp directory is not working then try this:

  1. first stop all the entities like namenode, datanode etc. (you will be having some script or command to do that)
  2. Format tmp directory
  3. Go to /var/cache/hadoop-hdfs/hdfs/dfs/ and delete all the contents in the directory manually
  4. Now format your namenode again
  5. start all the entities then use jps command to confirm that the datanode has been started
  6. Now run whichever application you have

Hope this helps.



回答18:

  1. I configured hadoop.tmp.dir in conf/core-site.xml
  2. I configured dfs.data.dir in conf/hdfs-site.xml
  3. I configured dfs.name.dir in conf/hdfs-site.xml
  4. Deleted everything under "/tmp/hadoop-/" directory
  5. Changed file permissions from 777 to 755 for directory listed under dfs.data.dir

    And the data node started working.



回答19:

Even after removing the remaking the directories, the datanode wasn't starting. So, I started it manually using bin/hadoop datanode It did not reach any conclusion. I opened another terminal from the same username and did jps and it showed me the running datanode process. It's working, but I just have to keep the unfinished terminal open by the side.



回答20:

Follow these steps and your datanode will start again.

1)Stop dfs. 2)Open hdfs-site.xml 3)Remove the data.dir and name.dir properties from hdfs-site.xml and -format namenode again.

4)Then start dfs again.



回答21:

Got the same error. Tried to start and stop dfs several times, cleared all directories that are mentioned in previous answers, but nothing helped.

The issue was resolved only after rebooting OS and configuring Hadoop from the scratch. (configuring Hadoop from the scratch without rebooting didn't work)



回答22:

Once I was not able to find data node using jps in hadoop, then I deleted the current folder in the hadoop installed directory (/opt/hadoop-2.7.0/hadoop_data/dfs/data) and restarted hadoop using start-all.sh and jps.

This time I could find the data node and current folder was created again.



回答23:

Try this

  1. stop-all.sh
  2. vi hdfs-site.xml
  3. change the value given for property dfs.data.dir
  4. format namenode
  5. start-all.sh


回答24:

Delete the datanode under your hadoop folder then rerun start-all.sh



回答25:

In case of Mac os(Pseudo-distributed mode):

Open terminal

  1. Stop dfs. 'sbin/stop-all.sh'.
  2. cd /tmp
  3. rm -rf hadoop*
  4. Navigate to hadoop directory. Format the hdfs. bin/hdfs namenode -format
  5. sbin/start-dfs.sh


回答26:

I Have applied some mixed configuration, and its worked for me.
First >>
Stop Hadoop all Services using ${HADOOP_HOME}/sbin/stop-all.sh

Second >>
Check mapred-site.xml which is located at your ${HADOOP_HOME}/etc/hadoop/mapred-site.xml and change the localhost to master.

Third >>
Remove the temporary folder created by hadoop
rm -rf //path//to//your//hadoop//temp//folder

Fourth >>
Add the recursive permission on temp.
sudo chmod -R 777 //path//to//your//hadoop//temp//folder

Fifth >>
Now Start all the services again. And First check that all service including datanode is running. enter image description here



回答27:

Error in datanode.log file

$ more /usr/local/hadoop/logs/hadoop-hduser-datanode-ubuntu.log

Shows:

java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop_tmp/hdfs/datanode: namenode clusterID = CID-e4c3fed0-c2ce-4d8b-8bf3-c6388689eb82; datanode clusterID = CID-2fcfefc7-c931-4cda-8f89-1a67346a9b7c

Solution: Stop your cluster and issue the below command & then start your cluster again.

sudo rm -rf  /usr/local/hadoop_tmp/hdfs/datanode/*


回答28:

  • Erase the files where data and name are in dfs.

In my case , I have hadoop on windows, over C:/, this file according to core-site.xml, etc , it was in tmp/Administrator/dfs/data... name, etc, so erase it.

Then, namenode -format. and try again,



回答29:

    mv /usr/local/hadoop_store/hdfs/datanode /usr/local/hadoop_store/hdfs/datanode.backup

    mkdir /usr/local/hadoop_store/hdfs/datanode

    hadoop datanode OR start-all.sh

    jps