I have a data frame read with sqlContext.sql function in pyspark. This contains 4 numerics columns with information per client (this is the key id). I need to calculate the max value per client and join this value to the data frame:
+--------+-------+-------+-------+-------+
|ClientId|m_ant21|m_ant22|m_ant23|m_ant24|
+--------+-------+-------+-------+-------+
| 0| null| null| null| null|
| 1| null| null| null| null|
| 2| null| null| null| null|
| 3| null| null| null| null|
| 4| null| null| null| null|
| 5| null| null| null| null|
| 6| 23| 13| 17| 8|
| 7| null| null| null| null|
| 8| null| null| null| null|
| 9| null| null| null| null|
| 10| 34| 2| 4| 0|
| 11| 0| 0| 0| 0|
| 12| 0| 0| 0| 0|
| 13| 0| 0| 30| 0|
| 14| null| null| null| null|
| 15| null| null| null| null|
| 16| 37| 29| 29| 29|
| 17| 0| 0| 16| 0|
| 18| 0| 0| 0| 0|
| 19| null| null| null| null|
+--------+-------+-------+-------+-------+
In this case, the max value to the client "six" is 23 and the client "ten" is 30. the "null" is naturally null in the new column.
Please help me showing how can i do this operation.