I am using Hive Metastore in EMR. I am able to query the table manually through HiveSQL .
But When i use the same table in Spark Job, it says Input path does not exist: s3://
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3://....
I have deleted my above partition path in s3://.. but it still works in my Hive without Dropping Partition at table level. but its not working in pyspark anyways
Here is my full code
from pyspark import SparkContext, HiveContext
from pyspark import SQLContext
from pyspark.sql import SparkSession
sc = SparkContext(appName = "test")
sqlContext = SQLContext(sparkContext=sc)
sqlContext.sql("select count(*) from logan_test.salary_csv").show()
print("done..")
I submitted my job as below to use hive catalog tables.
spark-submit test.py --files /usr/lib/hive/conf/hive-site.xml