I have a pandas DataFrame that looks like this:
╔═══╦════════════╦═════════════╗
║ ║ VENDOR ID ║ DATE ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33 ║ 01/12/2018 ║
║ 2 ║ 33 ║ 03/12/2018 ║
║ 3 ║ 12 ║ 01/08/2018 ║
║ 4 ║ 12 ║ 01/15/2018 ║
║ 5 ║ 12 ║ 01/23/2018 ║
║ 6 ║ 33 ║ 05/12/2018 ║
║ 7 ║ 89 ║ 01/12/2018 ║
╚═══╩════════════╩═════════════╝
And I'm hoping to get a table that gives me the number of days since the same VENDOR ID last occured, like so:
╔═══╦════════════╦═════════════╗
║ ║ VENDOR ID ║ GAP ║
╠═══╬════════════╬═════════════╣
║ 1 ║ 33 ║ ---------- ║
║ 2 ║ 33 ║ 60 ║
║ 3 ║ 12 ║ ---------- ║
║ 4 ║ 12 ║ 7 ║
║ 5 ║ 12 ║ 8 ║
║ 6 ║ 33 ║ 60 ║
║ 7 ║ 89 ║ ---------- ║
╚═══╩════════════╩═════════════╝
I've been trying to find a way to achieve this using groupbys and other tricks but can't seem to get anything to work.
I did come up with what I think might work using 2 nested for loops or iterrrows in pandas but because of the size of my dataset using nested loops won't really work.
Any one have any ideas?