I am using Spark
with Ipython
and have a RDD
which contains data in this format when printed:
print rdd1.collect()
[u'2010-12-08 00:00:00', u'2010-12-18 01:20:00', u'2012-05-13 00:00:00',....]
Each data is a datetimestamp
and I want to find the minimum and the maximum in this RDD
. How can I do that?
If you RDD consists of datetime objects, what is wrong with simply using
See documentation
This example works for me
You can for example use
aggregate
function (for an explanation how it works see: What is the equivalent implementation of RDD.groupByKey() using RDD.aggregateByKey()?)or
DataFrames
: