How to test a Spark SQL Query without Scala

2019-06-03 07:38发布

问题:

I am trying to figure out how to test Spark SQL queries against a Cassandra database -- kind of like you would in SQL Server Management Studio. Currently I have to open the Spark Console and type Scala commands which is really tedious and error prone.

Something like:

scala > var query = csc.sql("select * from users");
scala > query.collect().foreach(println)

Especially with longer queries this can be a real pain.

This seems like a terribly inefficient way to test if your query is correct and what data you will get back. The other issue is when your query is wrong you get back a mile long error message and you have to scroll up the console to find it. How do I test my spark queries without using the console or writing my own application?

回答1:

You could use bin/spark-sql to avoid construct Scala program and just write SQL.

In order to use bin/spark-sql you may need to rebuild your spark with -Phive and -Phive-thriftserver.

More informations on Building Spark. Note: do not build against Scala2.11, thrift server dependencies seem not ready for the moment.



回答2:

You can write SQL in a file, read it in a variable in your testing script and set ssc.sql(file.read()) [Python way]

But it seems you are looking for something else. A test approach may be?



回答3:

Here is one example:

[donghua@vmxdb01 ~]$ $SPARK_HOME/bin/spark-sql --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.11 --conf spark.cassandra.connection.host=127.0.0.1

spark-sql> select * from kv where value > 2;

Error in query: Table or view not found: kv; line 1 pos 14

spark-sql> create TEMPORARY TABLE kv USING org.apache.spark.sql.cassandra OPTIONS (table "kv",keyspace "mykeyspace", cluster "Test Cluster",pushdown "true");

16/10/12 08:28:09 WARN SparkStrategies$DDLStrategy: CREATE TEMPORARY TABLE kv USING... is deprecated, please use CREATE TEMPORARY VIEW viewName USING... instead Time taken: 4.008 seconds

spark-sql> select * from kv; key1 1
key4 4 key3 3 key2 2 Time taken: 2.253 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3) from kv; key
key key key Time taken: 1.328 seconds, Fetched 4 row(s)

spark-sql> select substring(key,1,3),count(*) from kv group by substring(key,1,3); key 4
Time taken: 3.518 seconds, Fetched 1 row(s) spark-sql>