Spark Sql query fails

2019-07-28 10:28发布

问题:

Using Sparks 2/java/Cassanda2.2 Trying to run a simple sparks sql query, it errors: Tried as below, + variations like "'LAX'", and '=' instead of '=='.

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`LAX`' given input columns: [transdate, origin]; line 1 pos 42;
'Project ['origin]
+- 'Filter (origin#1 = 'LAX)
   +- SubqueryAlias origins
      +- LogicalRDD [transdate#0, origin#1]

JavaRDD<TransByDate> originDateRDD = javaFunctions(sc).cassandraTable("trans", "trans_by_date", CassandraJavaUtil.mapRowTo(TransByDate.class)).select(CassandraJavaUtil.column("origin"), CassandraJavaUtil.column("trans_date").as("transdate"));

long cnt1= originDateRDD.count();
System.out.println("sqlLike originDateRDD.count: "+cnt1); --> 406000
Dataset<Row> originDF = sparks.createDataFrame(originDateRDD, TransByDate.class);
originDF.createOrReplaceTempView("origins");
Dataset<Row> originlike = sparks.sql("SELECT origin FROM origins WHERE origin =="+ "LAX");

I have enabled Hive Support (if that helps) Thanks

回答1:

Put the column value inside single quote. Your query should look like below.

Dataset<Row> originlike = spark.sql("SELECT origin FROM origins WHERE origin == "+"'LAX'");

You can refer Querying Cassandra data using Spark SQL in Java for more detail.

Like query should be like below.

Dataset<Row> originlike = spark.sql("SELECT origin FROM origins WHERE origin like 'LA%'");


回答2:

Hive is not the problem, here is the line that is your problem:

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`LAX`' given input columns: [transdate, origin]; line 1 pos 42;

What this is saying is that among the column names, none are named LAX. The scala DSL asks for === when matching a value that is a key within a column, perahps something similar would be more ideal, something like a origins.filter($"origin === "LAX")