I have two data frames, that represent two different period in times for the same people. I'd like to understand, for each row, if there have been any changes in the 5 (fixed) column of the two data frames.
Before:
+--+------+------+------+------+------+------+
|id| sport| var1| var2| var3| var4| var5|
+--+------+------+------+------+------+------+
| 1|soccer|330234| | | | |
| 2|soccer| null| null| null| null| null|
| 3|soccer|330101| | | | |
| 4|soccer| null| null| null| null| null|
| 5|soccer| null| null| null| null| null|
| 6|soccer| null| null| null| null| null|
| 7|soccer| null| null| null| null| null|
| 8|soccer|330024|330401| | | |
| 9|soccer|330055|330106| | | |
|10|soccer| null| null| null| null| null|
|11|soccer|390027| | | | |
|12|soccer| null| null| null| null| null|
|13|soccer|330101| | | | |
|14|soccer|330059| | | | |
|15|soccer| null| null| null| null| null|
|16|soccer|140242|140281| | | |
|17|soccer|330214| | | | |
|18|soccer| | | | | |
|19|soccer|330055|330196| | | |
|20|soccer|210022| | | | |
+--+------+------+------+------+------+------+
After:
+--+------+------+------+------+------+------+
|id| sport| var1| var2| var3| var4| var5|
+--+------+------+------+------+------+------+
| 1|soccer|330234| | | | |
| 2|soccer| null| null| null| null| null|
| 3|soccer|330101| | | | |
| 4|soccer| null| null| null| null| null|
| 5|soccer| null| null| null| null| null|
| 6|soccer| null| null| null| null| null|
| 7|soccer| null| null| null| null| null|
| 8|soccer| null| null| null| null| null|
| 9|soccer|330106| | | | |
|10|soccer| null| null| null| null| null|
|11|soccer|390027| | | | |
|12|soccer| null| null| null| null| null|
|13|soccer| null| null| null| null| null|
|14|soccer|330128|330331|330106|330059| |
|15|soccer| null| null| null| null| null|
|16|soccer|140242|140281|140010| | |
|17|soccer|330214| | | | |
|18|soccer| null| null| null| null| null|
|19|soccer|330196| | | | |
|20|soccer|210022| | | | |
+--+------+------+------+------+------+------+
I know how to scan for differences in columns belonging to a row, but I am pretty clueless how to compare rows of two different data frames.
An ideal output would be:
+--+------+------+
|id| sport| diff|
+--+------+------+
| 1|soccer| 0|
| 2|soccer| 0|
| 3|soccer| 0|
| 4|soccer| 0|
| 5|soccer| 0|
| 6|soccer| 0|
| 7|soccer| 0|
| 8|soccer| 1|
| 9|soccer| 1|
|10|soccer| 0|
|11|soccer| 0|
|12|soccer| 0|
|13|soccer| 1|
|14|soccer| 1|
|15|soccer| 0|
|16|soccer| 1|
|17|soccer| 0|
|18|soccer| 0|
|19|soccer| 1|
|20|soccer| 0|