I am trying to compare record of current and previous row in the below DataFrame
. I want to calculate the Amount column.
scala> val dataset = sc.parallelize(Seq((1, 123, 50), (2, 456, 30), (3, 456, 70), (4, 789, 80))).toDF("SL_NO","ID","AMOUNT")
scala> dataset.show
+-----+---+------+
|SL_NO| ID|AMOUNT|
+-----+---+------+
| 1|123| 50|
| 2|456| 30|
| 3|456| 70|
| 4|789| 80|
+-----+---+------+
Calculation Logic:
- For the row no 1, AMOUNT should be 50 from first row.
- For the row no 2, if ID of SL_NO - 2 and 1 is not same then need to consider AMOUNT of SL_NO - 2 (i.e - 30). Otherwise AMOUNT of SL_NO - 1 (i.e. - 50)
- For the row no 3, if ID of SL_NO - 3 and 2 is not same then need to consider AMOUNT of SL_NO - 3 (i.e - 70). Otherwise AMOUNT of SL_NO - 2 (i.e. - 30)
Same logic need to follow for the other rows also.
Expected Output:
+-----+---+------+
|SL_NO| ID|AMOUNT|
+-----+---+------+
| 1|123| 50|
| 2|456| 30|
| 3|456| 30|
| 4|789| 80|
+-----+---+------+
Please help.