LeaseExpiredException: No lease error on HDFS

2019-02-02 23:06发布

问题:

I am trying to load large data to HDFS and I sometimes get the error below. any idea why?

The error:

org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /data/work/20110926-134514/_temporary/_attempt_201109110407_0167_r_000026_0/hbase/site=3815120/day=20110925/107-107-3815120-20110926-134514-r-00026 File does not exist. Holder DFSClient_attempt_201109110407_0167_r_000026_0 does not have any open files.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1557)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1548)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:1603)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:1591)
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:675)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy1.complete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.complete(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3566)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3481)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:966)
at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1297)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:78)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs$RecordWriterWithCounter.close(MultipleOutputs.java:303)
at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:456)
at com.my.hadoop.platform.sortmerger.MergeSortHBaseReducer.cleanup(MergeSortHBaseReducer.java:145)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:178)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)

回答1:

I managed to fix the problem:

When the job ends he deletes /data/work/ folder. If few jobs are running in parallel the deletion will also delete the files of the another job. actually I need to delete /data/work/.

In other words this exception is thrown when the job try to access to files which are not existed anymore



回答2:

I meet the same problem when i use spark streaming to saveAsHadoopFile to Hadoop(2.6.0-cdh5.7.1), of course i use MultipleTextOutputFormat to write different data to different path. Sometimes the exception what Zohar said would happen. The reason is as Matiji66 say:

another program read,write and delete this tmp file cause this error.

but the root reason he didn't talk about is the hadoop speculative:

Hadoop doesn’t try to diagnose and fix slow running tasks, instead, it tries to detect them and runs backup tasks for them.

So the really reason is that, your task execute slow, then hadoop run another task to do the same thing(in my case is to save data to a file on hadoop), when one task of the two task finished, it will delete the temp file, and the other after finished, it will delete the same file, then it does not exists, so the exception

does not have any open files

happened

you can fix it by close the speculative of spark and hadoop:

sparkConf.set("spark.speculation", "false");
sparkConf.set("spark.hadoop.mapreduce.map.speculative", "false");
sparkConf.set("spark.hadoop.mapreduce.reduce.speculative", "false")


回答3:

For my case, another program read,write and delete this tmp file cause this error. Try to avoid this.



回答4:

ROOT CAUSE

Storage policy was set on staging directory and hence MAPREDUCE job failed.

<property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property>

RESOLUTION

Setup staging directory for which storage policy is not setup. I.e. modify yarn.app.mapreduce.am.staging-dir in yarn-site.xml

<property> 
<name>yarn.app.mapreduce.am.staging-dir</name> 
<value>/tmp</value> 
</property>


回答5:

I use Sqoop to import into HDFS and have same error. By the help of previous answers I have realized that I needed to remove last "/" from

--target-dir /dw/data/

I used

--target-dir /dw/data
works fine



回答6:

I encountered this problem when I changed my program to use saveAsHadoopFile method to improve performance, in which scenario I can't make use of DataFrame API directly. see the problem

The reason why this would happen is basically what Zohar said, the saveAsHadoopFile method with MultipleTextOutputFormat actually doesn't allow multiple programs concurrently running to save files to the same directory. Once a program finished, it would delete the common _temporary directory the others still need, I am not sure if it's a bug in M/R API. (2.6.0-cdh5.12.1)

You can try this solution below if you can't redesign your program:

This is the source code of FileOutputCommitter in M/R API: (you must download an according version)

package org.apache.hadoop.mapreduce.lib.output;
public class FileOutputCommitter extends OutputCommitter {
private static final Log LOG = LogFactory.getLog(FileOutputCommitter.class);
/** 
 * Name of directory where pending data is placed.  Data that has not been
 * committed yet.
 */
public static final String PENDING_DIR_NAME = "_temporary";

Changes:

"_temporary"

To:

System.getProperty("[the property name you like]")

Compiles the single Class with all required dependencies, then creates a jar with the three output class files and places the jar to you classpath. (make it before the original jar)

Or, you can simply put the source file to your project.

Now, you can config the temp directory for each program by setting a different system property.

Hope it can help you.



标签: hadoop hdfs