可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to store a stream of data comming in from a kafka topic into a hive partition table. I was able to convert the dstream to a dataframe and created a hive context. My code looks like this

val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
newdf.registerTempTable("temp") //newdf is my dataframe
newdf.write.mode(SaveMode.Append).format("osv").partitionBy("date").saveAsTable("mytablename")

But when I deploy the app on cluster, its says

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/tmp/spark-3f00838b-c5d9-4a9a-9818-11fbb0007076/scratch_hive_2016-10-18_23-18-33_118_769650074381029645-1, expected: hdfs://

When I try to save it as a normal table and comment out the hiveconfigurations it work. But, with partition table...its giving me this error.

I also tried registering the dataframe as a temp table and then to write that table to the partition table. Doing that also gave me the same error

Can someone please tell how can I solve it. Thanks.

回答1:

You need to use hadoop(hdfs) configured if you are deploying the app on the cluster.

With saveAsTable the default location that Spark saves to is controlled by the HiveMetastore (based on the docs). Another option would be to use saveAsParquetFile and specify the path and then later register that path with your hive metastore OR use the new DataFrameWriter interface and specify the path option write.format(source).mode(mode).options(options).saveAsTable(tableName).

回答2:

I figured it out. In the code for spark app, I declared the scratch dir location as below and it worked.

sqlContext.sql("SET hive.exec.scratchdir=<hdfs location>")

storing a Dataframe to a hive partition table in s

问题:

回答1:

回答2:

收藏的人(0)

storing a Dataframe to a hive partition table in s

问题:

回答1:

回答2:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮