I am trying to use spark sql to query a csv file placed in Data Lake Store. when I query i am getting "java.lang.ClassNotFoundException: Class com.microsoft.azure.datalake.store.AdlFileSystem not found".
How can I use spark sql to query a file placed in Data Lake Store? Please help me with a sample.
Example csv:
Id Name Designation
1 aaa bbb
2 ccc ddd
3 eee fff
Thanks in advance, Sowandharya
It seems that you didn't configure
Cluster AAD Identity
for Data Lake Store when creating a HDInsight Cluster.You can try to create a Spark Cluster of HDInsight with Data Lake Store on Azure portal, please see https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-hdinsight-hadoop-use-portal/.
Presently HDInsight-Spark Clusters are not available with Azure Data Lake Storage. Once we have the support it would work seamlessly. In the mean time you can try and use ADL Analytics to the same job on ADLS using U-SQL queries. For reference please visit the link: https://azure.microsoft.com/en-us/documentation/articles/data-lake-analytics-get-started-portal/ We are working for the support and it is currently targeted for some time prior to summer 2016. Hope it helps.
Thanks, Sourabh.
Tried hours today to figure it out... leaving it here in case someone else needs help!
For Hadoop 3.0.1, ensure that the below is uncommented in
hadoop-env.sh
fileexport HADOOP_OPTIONAL_TOOLS