Spark support for using Windows function

2019-07-27 11:16发布

问题:

I am using spark version 1.6.0..while I am using spark with python.I found that windows function are not been supported by the version of the spark that I am using,as when I tried to use windows function in my query(using sparksql) it gave me an error as 'you need to build spark with hive functionality'.Following that I searched various things and found that I need to use spark version 1.4.0.,which I did with no luck.Some posts also suggested to build spark with hive functionality.But I did not found the right way to do it.
when used spark 1.4.0.I got the following error.

raise ValueError("invalid mode %r (only r, w, b allowed)")
ValueError: invalid mode %r (only r, w, b allowed)
16/04/04 14:17:17 WARN PythonRDD: Incomplete task interrupted: Attempting to kil
l Python Worker
16/04/04 14:17:17 INFO HadoopRDD: Input split: file:/C:/Users/test
esktop/spark-1.4.0-bin-hadoop2.4/test:910178+910178
16/04/04 14:17:17 INFO Executor: Executor killed task 1.0 in stage 1.0 (TID 2)
16/04/04 14:17:17 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, localh
ost): TaskKilled (killed intentionally)
16/04/04 14:17:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool

回答1:

I think that this is the third time that I answer a similar question :

  • Using windowing functions in Spark.
  • Window function is not working on Pyspark sqlcontext.

Windows function are supported with HiveContext and not regular SQLContext.

Concerning how to build spark with hive support, the answer is in the official Building Spark documentation :

Building with Hive and JDBC Support To enable Hive integration for Spark SQL along with its JDBC server and CLI, add the -Phive and Phive-thriftserver profiles to your existing build options. By default Spark will build with Hive 0.13.1 bindings.

Apache Hadoop 2.4.X with Hive 13 support (example):

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

Building for Scala 2.11

To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11 property:

./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

There is magic here, everything is in the documentation.