I have used show partitions in spark sql which gives me the following:
year=2019/month=1/day=21
year=2019/month=1/day=22
year=2019/month=1/day=23
year=2019/month=1/day=24
year=2019/month=1/day=25
year=2019/month=1/day=26
year=2019/month=2/day=27
- I need to extract latest partition
- I need to the year, month and day separately so I can use it in another dataframe as variables. I.e:
part_year=2019
part_month=1
part_day=29
I have used:
val overwrite2 = overwrite.select(col("partition",8,8) as year
from which I get
2019/month
For removing this I use another dataframe where I use regex_replace
to replace month with blank so another dataframe is created.
This is in turn creating a lot of overhead. What I want is for all these steps to be done in one dataframe so I can get the resultant dataframe as:
part_year=2019
part_month=2
part_day=27
with latest partition being picked up.
DateTime
by sorting withisAfter
to get latest partition like given as below example.After
spark.sql(s"show Partitions $yourtablename")
you will get a dataframecollect
that since its small data no issue.once you collect the dataframe partitions you will get an array like this
Final result : i.e. your most recent date is
2019-02-27
now based on this you can query hive data in an optimized way.