I use spark-shell
to do the below operations.
Recently loaded a table with an array column in spark-sql .
Here is the DDL for the same:
create table test_emp_arr{
dept_id string,
dept_nm string,
emp_details Array<string>
}
the data looks something like this
+-------+-------+-------------------------------+
|dept_id|dept_nm| emp_details|
+-------+-------+-------------------------------+
| 10|Finance|[Jon, Snow, Castle, Black, Ned]|
| 20| IT| [Ned, is, no, more]|
+-------+-------+-------------------------------+
I can query the emp_details column something like this :
sqlContext.sql("select emp_details[0] from emp_details").show
Problem
I want to query a range of elements in the collection :
Expected query to work
sqlContext.sql("select emp_details[0-2] from emp_details").show
or
sqlContext.sql("select emp_details[0:2] from emp_details").show
Expected output
+-------------------+
| emp_details|
+-------------------+
|[Jon, Snow, Castle]|
| [Ned, is, no]|
+-------------------+
In pure Scala, if i have an array something as :
val emp_details = Array("Jon","Snow","Castle","Black")
I can get the elements from 0 to 2 range using
emp_details.slice(0,3)
returns me
Array(Jon, Snow,Castle)
I am not able to apply the above operation of the array in spark-sql.
Thanks
Here is a solution using a User Defined Function which has the advantage of working for any slice size you want. It simply builds a UDF function around the scala builtin
slice
method :Example with a sample of your data :
Produces the expected output
You can also register the UDF in your
sqlContext
and use it like thisYou won't need
lit
anymore with this solution