Namenode not getting started

2019-01-07 04:00发布

问题:

I was using Hadoop in a pseudo-distributed mode and everything was working fine. But then I had to restart my computer because of some reason. And now when I am trying to start Namenode and Datanode I can find only Datanode running. Could anyone tell me the possible reason of this problem? Or am I doing something wrong?

I tried both bin/start-all.sh and bin/start-dfs.sh.

回答1:

I was facing the issue of namenode not starting. I found a solution using following:

  1. first delete all contents from temporary folder: rm -Rf <tmp dir> (my was /usr/local/hadoop/tmp)
  2. format the namenode: bin/hadoop namenode -format
  3. start all processes again:bin/start-all.sh

You may consider rolling back as well using checkpoint (if you had it enabled).



回答2:

hadoop.tmp.dir in the core-site.xml is defaulted to /tmp/hadoop-${user.name} which is cleaned after every reboot. Change this to some other directory which doesn't get cleaned on reboot.



回答3:

Following STEPS worked for me with hadoop 2.2.0,

STEP 1 stop hadoop

hduser@prayagupd$ /usr/local/hadoop-2.2.0/sbin/stop-dfs.sh

STEP 2 remove tmp folder

hduser@prayagupd$ sudo rm -rf /app/hadoop/tmp/

STEP 3 create /app/hadoop/tmp/

hduser@prayagupd$ sudo mkdir -p /app/hadoop/tmp
hduser@prayagupd$ sudo chown hduser:hadoop /app/hadoop/tmp
hduser@prayagupd$ sudo chmod 750 /app/hadoop/tmp

STEP 4 format namenode

hduser@prayagupd$ hdfs namenode -format

STEP 5 start dfs

hduser@prayagupd$ /usr/local/hadoop-2.2.0/sbin/start-dfs.sh

STEP 6 check jps

hduser@prayagupd$ $ jps
11342 Jps
10804 DataNode
11110 SecondaryNameNode
10558 NameNode


回答4:

In conf/hdfs-site.xml, you should have a property like

<property>
    <name>dfs.name.dir</name>
    <value>/home/user/hadoop/name/data</value>
</property>

The property "dfs.name.dir" allows you to control where Hadoop writes NameNode metadata. And giving it another dir rather than /tmp makes sure the NameNode data isn't being deleted when you reboot.



回答5:

Open a new terminal and start the namenode using path-to-your-hadoop-install/bin/hadoop namenode

The check using jps and namenode should be running



回答6:

Why do most answers here assume that all data needs to be deleted, reformatted, and then restart Hadoop? How do we know namenode is not progressing, but taking lots of time. It will do this when there is a large amount of data in HDFS. Check progress in logs before assuming anything is hung or stuck.

$ [kadmin@hadoop-node-0 logs]$ tail hadoop-kadmin-namenode-hadoop-node-0.log

...
016-05-13 18:16:44,405 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 117/141 transactions completed. (83%)
2016-05-13 18:16:56,968 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 121/141 transactions completed. (86%)
2016-05-13 18:17:06,122 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 122/141 transactions completed. (87%)
2016-05-13 18:17:38,321 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 123/141 transactions completed. (87%)
2016-05-13 18:17:56,562 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 124/141 transactions completed. (88%)
2016-05-13 18:17:57,690 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 127/141 transactions completed. (90%)

This was after nearly an hour of waiting on a particular system. It is still progressing each time I look at it. Have patience with Hadoop when bringing up the system and check logs before assuming something is hung or not progressing.



回答7:

If anyone using hadoop1.2.1 version and not able to run namenode, go to core-site.xml, and change dfs.default.name to fs.default.name.

And then format the namenode using $hadoop namenode -format.

Finally run the hdfs using start-dfs.sh and check for service using jps..



回答8:

In core-site.xml:

    <configuration>
       <property>
          <name>fs.defaultFS</name>
          <value>hdfs://localhost:9000</value>
       </property>
       <property>
          <name>hadoop.tmp.dir</name>
          <value>/home/yourusername/hadoop/tmp/hadoop-${user.name}
         </value>
  </property>
</configuration>

and format of namenode with :

hdfs namenode -format

worked for hadoop 2.8.1



回答9:

Did you change conf/hdfs-site.xml dfs.name.dir?

Format namenode after you change it.

$ bin/hadoop namenode -format
$ bin/hadoop start-all.sh


回答10:

If you facing this issue after rebooting the system, Then below steps will work fine

For workaround.

1) format the namenode: bin/hadoop namenode -format

2) start all processes again:bin/start-all.sh

For Perm fix: -

1) go to /conf/core-site.xml change fs.default.name to your custom one.

2) format the namenode: bin/hadoop namenode -format

3) start all processes again:bin/start-all.sh



回答11:

Faced the same problem.

(1) Always check for the typing mistakes in the configuring the .xml files, especially the xml tags.

(2) go to bin dir. and type ./start-all.sh

(3) then type jps , to check if processes are working



回答12:

Add hadoop.tmp.dir property in core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/yourname/hadoop/tmp/hadoop-${user.name}</value>
  </property>
</configuration>

and format hdfs (hadoop 2.7.1):

$ hdfs namenode -format

The default value in core-default.xml is /tmp/hadoop-${user.name}, which will be deleted after reboot.



回答13:

Try this,

1) Stop all hadoop processes : stop-all.sh

2) Remove the tmp folder manually

3) Format namenode : hadoop namenode -format

4) Start all processes : start-all.sh



回答14:

If you kept default configurations when running hadoop the port for the namenode would be 50070. You will need to find any processes running on this port and kill them first.

  • Stop all running hadoop with : bin/stop-all.sh

    check all processes running in port 50070

  • sudo netstat -tulpn | grep :50070 #check any processes running in port 50070, if there are any the / will appear at the RHS of the output.

  • sudo kill -9 <process_id> #kill_the_process.

  • sudo rm -r /app/hadoop/tmp #delete the temp folder

  • sudo mkdir /app/hadoop/tmp #recreate it

  • sudo chmod 777 –R /app/hadoop/tmp (777 is given for this example purpose only)

  • bin/hadoop namenode –format #format hadoop namenode

  • bin/start-all.sh #start-all hadoop services

Refer this blog



回答15:

For me the following worked after I changed the directory of the namenode and datanode in hdfs-site.xml

-- before executing the following steps stop all services with stop-all.sh or in my case I used the stop-dfs.sh to stop the dfs

  1. On the new configured directory, for every node (namenode and datanode), delete every folder/files inside it (in my case a 'current' directory).
  2. delete the Hadoop temporary directory: $rm -rf /tmp/haddop-$USER
  3. format the Namenode: hadoop/bin/hdfs namenode -format
  4. start-dfs.sh

After I followed those steps my namenode and datanodes were alive using the new configured directory.



回答16:

I ran $hadoop namenode to start namenode manually at foreground.

From the logs I figured out that 50070 is ocuupied, which was defaultly used by dfs.namenode.http-address. After configuring dfs.namenode.http-address in hdfs-site.xml, everything went well.



回答17:

I ran into the same thing after a restart.

for hadoop-2.7.3 all I had to do was format the namenode:

<HadoopRootDir>/bin/hdfs namenode -format

Then a jps command shows

6097 DataNode
755 RemoteMavenServer
5925 NameNode
6293 SecondaryNameNode
6361 Jps


回答18:

I got the solution just share with you that will work who got the errors:

1. First check the /home/hadoop/etc/hadoop path, hdfs-site.xml and

 check the path of namenode and datanode 

<property>
  <name>dfs.name.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
  <name>dfs.data.dir</name>
    <value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>

2.Check the permission,group and user of namenode and datanode of the particular path(/home/hadoop/hadoopdata/hdfs/datanode), and check if there are any problems in all of them and if there are any mismatch then correct it. ex .chown -R hadoop:hadoop in_use.lock, change user and group

chmod -R 755 <file_name> for change the permission


回答19:

After deleting a resource managers' data folder, the problem is gone.
Even if you have formatting cannot solve this problem.



回答20:

If your namenode is stuck in safemode you can ssh to namenode, su hdfs user and run the following command to turn off safemode:

hdfs dfsadmin -fs hdfs://server.com:8020 -safemode leave


回答21:

Instead of formatting namenode, may be you can use the below command to restart the namenode. It worked for me:

sudo service hadoop-master restart

  1. hadoop dfsadmin -safemode leave


标签: hadoop hdfs