I have a dataframe that contains for each group the number of observations during a certain period. Some groups don't contain all periods, and for these groups I want to append x rows with the missing periods in it. So that each group has a row for all 6 periods
My current df looks something like this:
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
3 2 4
3 5 25
I want a dataframe looking like this.
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
2 2 NONE
2 3 NONE
2 4 NONE
2 5 NONE
2 6 4
3 1 NONE
3 2 4
3 3 NONE
3 4 NONE
3 5 25
3 6 NONE
So what happenend:
- For ID == 1, nothing happened, because it contained all 6 periods
- For ID == 2, it appended 5, rows for each period that it didn't have in the first df.
- For ID == 2, it appended 4, rows for each period that it didn't have in the first df. So it added rows for the periods 1,3,4 & 6.
I really don't have a clue how to do it, so help would really be appreciated.
You can set the index to 'ID' and 'PERIOD' and then construct a new index by generating the product of both columns and pass this as the new index to
reindex
, this has an optionalfill_value
param which you can set to the strNONE
:So breaking the above down:
You can unstack the results on
PERIOD
and then stack them back with thedropna
option set to False.