I'll just start with the example and then break down what is happening.
This is a sample input:
DataFrame:
**Name** **No.** **Test** ***Grade***
Bob 2123320 Math Nan
Joe 2832883 English 90
John 2139300 Science 85
Bob 2123320 History 93
John 2234903 Math 99
Hopeful output:
**Name** ********2139300********* ********2234903*******
Math English Science Math English Science
John 0 0 85 99 0 0
Like the title suggests, I am trying to apply multiple indexes. So basically it starts by looking for each name, and then for each name it finds to see how many distinct No.'s it has. In this case it sets a threshold at at least 2 distinct No.'s (which is why only John is outputted and Joe/Bob are not).
Now in each of these distinct No's. I have a specific subset of Tests I want to search for, in this case only {Math, English, Science}. For each of these tests, if the person in question took it in that No., there should be a grade. I would like that grade to be outputted for the test in question and for the tests not taken by that person on that No. I would like it to output some sort of simple marker (i.e if the person only took Math on that day, for English and Science output 0).
So in effect, it first indexes people by the number of distinct No.'s and groups them as such. It then indexes them by type of Test (for which I only want a subset). It finally assigns each person a value for the type of test they took and for the ones they didn't simply outputs an 0.
It's similar to another problem I asked earlier: Grouped Feature Matrix in Python #2- Follow Up
Except now instead of 1's and 0's I have another column with actual values that I would like to output.
Thank you.
EDIT: More sample/Output
**Name** **No.** **Test** ***Grade***
Bob 2123320 Math Nan
Joe 2832883 English 90
John 2139300 Science 85
Bob 2123320 History 93
John 2234903 Math 99
Bob 2932848 English 99
**Name** 2139300 2234903 2123320 2932848
M E S M E S M E S M E S
John 0 0 85 99 0 0 Nan Nan Nan Nan Nan Nan
Bob Nan Nan Nan Nan nan Nan 86 0 0 0 99 0
Let's use:
Filter the dataframe to only those records you are concerned with
Now, reshape dataframe with
set_index
,unstack
, andreindex
:Output:
You can use
pivot_table
:With the dropna flag to include all the NaN columns:
and with fill_value=0