I'm trying to filter the date range from the following data using Data bricks, which returns null as response.
My csv data looks like:
ID, Desc, Week_Ending_Date
100, AAA, 13-06-2015
101, BBB, 11-07-2015
102, CCC, 15-08-2015
103, DDD, 05-09-2015
100, AAA, 29-08-2015
100, AAA, 22-08-2015
My query is:
df.select(df("ID"), date_format(df("Week_Ending_Date"), "yyyy-MM-dd"))
.filter(date_format(df("Week_Ending_Date"), "yyyy-MM- dd").between("2015-07-05", "2015-09-02"))
Any help is much appreciated.
From the top of my head, I would have done the following by converting the date column while reading it and then apply the filter using an alias :
import java.text.SimpleDateFormat
val format = new SimpleDateFormat("dd-MM-yyyy")
val data = sc.parallelize(
List((100, "AAA", "13-06-2015"), (101, "BBB", "11-07-2015"), (102, "CCC", "15-08-2015"), (103, "DDD", "05-09-2015"), (100, "AAA", "29-08-2015"), (100, "AAA", "22-08-2015")).toSeq).map {
r =>
val date: java.sql.Date = new java.sql.Date(format.parse(r._3).getTime);
(r._1, r._2, date)
}.toDF("ID", "Desc", "Week_Ending_Date")
data.show
//+---+----+----------------+
//| ID|Desc|Week_Ending_Date|
//+---+----+----------------+
//|100| AAA| 2015-06-13|
//|101| BBB| 2015-07-11|
//|102| CCC| 2015-08-15|
//|103| DDD| 2015-09-05|
//|100| AAA| 2015-08-29|
//|100| AAA| 2015-08-22|
//+---+----+----------------+
val filteredData = data
.select(data("ID"), date_format(data("Week_Ending_Date"), "yyyy-MM-dd").alias("date"))
.filter($"date".between("2015-07-05", "2015-09-02"))
//+---+----------+
//| ID| date|
//+---+----------+
//|101|2015-07-11|
//|102|2015-08-15|
//|100|2015-08-29|
//|100|2015-08-22|
//+---+----------+