I have following table:
+-----+---+----+
|type | t |code|
+-----+---+----+
| A| 25| 11|
| A| 55| 42|
| B| 88| 11|
| A|114| 11|
| B|220| 58|
| B|520| 11|
+-----+---+----+
And what I want:
+-----+---+----+
|t1 | t2|code|
+-----+---+----+
| 25| 88| 11|
| 114|520| 11|
+-----+---+----+
There are two types of events A and B. Event A is the start, Event B is the end. I want to connect the start with the next end dependence of the code.
It's quite easy in SQL to do this:
SELECT a.t AS t1,
(SELECT b.t FROM events AS b WHERE a.code == b.code AND a.t < b.t LIMIT 1) AS t2, a.code AS code
FROM events AS a
But I have to problem to implement this in Spark because it looks like that this kind of nested query isn't supported...
I tried it with:
df.createOrReplaceTempView("events")
val sqlDF = spark.sql(/* SQL-query above */)
Error i get:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Accessing outer query column is not allowed in:
Do you have any other ideas to solve that problem?