how to execute group of function on this column “i

2020-01-19 07:45发布

问题:

using spark-sql-2.4.1v.

val df = Seq(
  ("50312", "2019-03-31", "0.9992019"),
  ("50312", "2018-03-31", "0.9992018"),
("50312", "2017-03-31", "0.9992017")).toDF("id","date","item_value")
.withColumn("date", to_date(col("date") ,"yyyy-MM-dd").cast(DateType))
.withColumn("add_months", add_months($"date",-17))


val df2  = df.filter($"date".between(to_date(lit("2019-03-31"),"yyyy-MM-dd"),  add_months(to_date(lit("2019-03-31"),"yyyy-MM-dd"),-17)));
df2.show(20)

val df3  = df.filter($"date".lt(to_date(lit("2019-03-31"),"yyyy-MM-dd")))
             .filter($"date".gt(add_months(to_date(lit("2019-03-31"),"yyyy-MM-dd"),-17)))
df3.show(20)

between not working as expected , what is wrong here ? how to fix it?

回答1:

As mentioned in the comments between first expects the lower bound and then the upper bound.

One thing I recall though I cannot find right now:

When you work with dates/timestamps there was an inconsistency in how the inclusiveness was handled.

Something like 2020-01-01 should be inclusive (but isn't on the lower bound) while 2020-01-01 00:00:00 is.