How to access existing table in Hive?

2019-02-19 04:23发布

问题:

I am trying to access HIVE from spark application with scala.

My code:

val hiveLocation   = "hdfs://master:9000/user/hive/warehouse"
val conf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[*]").set("spark.sql.warehouse.dir",hiveLocation)

val sc = new SparkContext(conf)
val spark = SparkSession
  .builder()
  .appName("SparkHiveExample")
  .master("local[*]")
  .config("spark.sql.warehouse.dir", hiveLocation)
  .config("spark.driver.allowMultipleContexts", "true")
  .enableHiveSupport()
  .getOrCreate()
println("Start of SQL Session--------------------")

spark.sql("select * from test").show()
println("End of SQL session-------------------")

But it ends up with error message

Table or view not found

but when I run show tables; under hive console , I can see that table and can run Select * from test. All are in "user/hive/warehouse" location. Just for testing I tried with create table also from spark, just to find out the table location.

val spark = SparkSession
      .builder()
  .appName("SparkHiveExample")
  .master("local[*]")
  .config("spark.sql.warehouse.dir", hiveLocation)
  .config("spark.driver.allowMultipleContexts", "true")
  .enableHiveSupport()
    .getOrCreate()
println("Start of SQL Session--------------------")
spark.sql("CREATE TABLE IF NOT EXISTS test11(name String)")
println("End of SQL session-------------------")

This code also executed properly (with success note) but strange thing is that I can find this table from hive console.

Even if I use select * from TBLS; in mysql (in my setup I configured mysql as metastore for hive), I did not found those tables which are created from spark.

Is spark location is different than hive console?

What I have to do if I need to access existing table in hive from spark?

回答1:

from the spark sql programming guide: (I highlighted the relevant parts)

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started

you need to add a hive-site.xml config file to the resource dir. here is the minimum needed values for spark to work with hive (set the host to the host of hive):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://host:9083</value>
        <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
    </property>

</configuration>