org.apache.hadoop.hive.ql.metadata.Hive.loadDynami

2019-08-21 00:47发布

问题:

I have this strange behavior , my use case is to write a Spark dataframe to a hive partitioned table by using

sqlContext.sql("INSERT OVERWRITE TABLE <table> PARTITION (<partition column) SELECT * FROM <temp table from dataframe>") 

the strange thing is this works when using pyspark shell from a host A, but the same exact code ,connected to the same cluster,using the same hive table does not work in jupyter notebooks, it returns:

java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions 

exception so is seems to me as some jar mismatch between the host where pyspark shell is launched, and the host where jupyter is launched, my question is , how can i determine which version of the corresponding jar is bein used in pyspark shell, and in jupyter notebook by code(i have no access to the jupyter server) ? and why can 2 distinct versions are being used if both pyspark shell, and jupyter are connecting to the same cluster?

Update :after some researching i found jupyter is using "Livy" and Livy host uses hive-exec-2.0.1.jar, the host where we use pyspark shell uses hive-exec-1.2.1000.2.5.3.58-3.jar, so i downloaded both jars from maven repository and decompiled them, i found that altough loadDynamicPartitions method exists in both, method signature(parameters) differ, in livy version boolean holdDDLTime parameter is missing.

回答1:

I had similar problem try get the maven dependencies from cloudera

 <dependencies>
    <!-- Scala and Spark dependencies -->

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.0-cdh5.9.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.6.0-cdh5.9.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.10</artifactId>
        <version>1.6.0-cdh5.9.2</version>
    </dependency>
     <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
    <dependency>
        <groupId>org.apache.hive</groupId>
        <artifactId>hive-exec</artifactId>
        <version>1.1.0-cdh5.9.2</version>
    </dependency>
    <dependency>
        <groupId>org.scalatest</groupId>
        <artifactId>scalatest_2.10</artifactId>
        <version>3.0.0-SNAP4</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.10</artifactId>
        <version>1.4.1</version>
    </dependency>
    <dependency>
        <groupId>commons-dbcp</groupId>
        <artifactId>commons-dbcp</artifactId>
        <version>1.2.2</version>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.10</artifactId>
        <version>1.4.0</version>
    </dependency>
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-xml_2.10</artifactId>
        <version>0.2.0</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.0.12</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-s3</artifactId>
        <version>1.11.172</version>
    </dependency>
    <dependency>
        <groupId>com.github.scopt</groupId>
        <artifactId>scopt_2.10</artifactId>
        <version>3.2.0</version>
    </dependency>
    <dependency>
        <groupId>javax.mail</groupId>
        <artifactId>mail</artifactId>
        <version>1.4</version>
    </dependency>
</dependencies>
<repositories>
    <repository>
        <id>maven-hadoop</id>
        <name>Hadoop Releases</name>
        <url>https://repository.cloudera.com/content/repositories/releases/</url>
    </repository>
    <repository>
        <id>cloudera-repos</id>
        <name>Cloudera Repos</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
</repositories>