Hive shell throws Filenotfound exception while exe

2019-05-27 18:50发布

问题:

1) I have added serde jar file using "ADD JAR /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar;"

2) Create table

3) The table is creates successfully

4) But when I execute any select query it throws file not found exception

hive> select count(*) from tab_tweets;

Query ID = hduser_20150604145353_51b4def4-11fb-4638-acac-77301c1c1806
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.io.FileNotFoundException: File does not exist: hdfs://node1:9000/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:269)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:390)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:483)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:428)
    at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1183)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1039)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:207)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:754)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: hdfs://node1:9000/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

回答1:

METHOD 1: Copy hive-serdes-1.0-SNAPSHOT.jar file from local filesystem to HDFS.

hadoop fs -mkdir /home/hduser/softwares/hive/
hadoop fs -put /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar /home/hduser/softwares/hive/

Note: Use hdfs dfs instead of hadoop fs, if you are using latest hadoop versions.

METHOD 2: Change the value for hive.aux.jars.path in hive-site.xml as:

<property>
 <name>hive.aux.jars.path</name>
 <value>file:///home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar</value>
</property>

METHOD 3: Add hive-serdes-1.0-SNAPSHOT.jar in hadoop classpath. i.e., add this line in hadoop-env.sh:

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar

NOTE: I have mentioned the paths considering you have installed hive in /home/hduser/softwares/hive. If you have hive installed elsewhere, please change /home/hduser/softwares/hive to point to your hive installation folder.



回答2:

Check jar exists at /home/hduser/softwares/hive/hive-serdes-1.0-SNAPSHOT.jar



回答3:

Note: No need to copy the hive-serdes-1.0-SNAPSHOT.jar in to hdfs, keep it in local Fs itself.

At Query execution time. Hive will take care of it making available in all nodes as D.C

Refer this link for details : official link

FYI - refer Hive Resources

Once a resource is added to a session, Hive queries can refer to it by its name (in map/reduce/transform clauses) and the resource is available locally at execution time on the entire Hadoop cluster. Hive uses Hadoop's Distributed Cache to distribute the added resources to all the machines in the cluster at query execution time

you can add additional Jars in multipe ways:

  • for current hive session:

hive > add jar /local/fs/path/to/your/file.jar

hive > list jars //-- to check

  • Adding on the node from which you are running hive in .hiverc like .bashrc

    cd $HOME

    create a file .hiverc

    cat $HOME/.hiverc

    add jar /local/fs/path/to/your/file.jar // add this line

  • Adding jar file to hive-site.xml

    hive.aux.jars.path file:///home/user/path/to/your/hive-serdes-1.0-SNAPSHOT.jar