I have imported a json file in Spark and convertd it into a table as
myDF.registerTempTable("myDF")
I then want to run SQL queries on this resulting table
val newTable = sqlContext.sql("select column-1 from myDF")
However this gives me an error because of the hypen in the name of the column column-1
. How do I resolve this is Spark SQL?
Backticks (`) appear to work, so
val newTable = sqlContext.sql("select `column-1` from myDF")
should do the trick, at least in Spark v1.3.x.
Was at it for a bit yesterday, turns out there is a way to escape the (:) and a (.) like so:
Only the field containing (:) needs to be escaped with backticks
sqlc.select("select `sn2:AnyAddRq`.AnyInfo.noInfo.someRef.myInfo.someData.Name AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()
I cannot comment as I have less than 50 reps
When you are referencing a json structure with struct.struct.field and there is a namespace present like:
ns2:struct.struct.field the backticks(`) does not work.
jsonDF = sqlc.read.load('jsonMsgs', format="json")
jsonDF.registerTempTable("masterTable")
sqlc.select("select `sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name` AS sn2_AnyAddRq_AnyInfo_noInfo_someRef_myInfo_someData_Name from masterTable").show()
pyspark.sql.utils.AnalysisException: u"cannot resolve 'sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData.Name
'
If I remove the sn2: fields, the query executes.
I have also tried with single quote ('), backslash (\) and double quotes("")
The only way it works if if I register another temp table on the sn2: strucutre, I am able access the fields within it like so
anotherDF = jsonDF.select("sn2:AnyAddRq.AnyInfo.noInfo.someRef.myInfo.someData")
anotherDF.registerTempTable("anotherDF")
sqlc.select("select Name from anotherDF").show()