Individuals (indexed from 0 to 5) choose between two locations: A and B. My data has a wide format containing characteristics that vary by individual (ind_var) and characteristics that vary only by location (location_var).
For example, I have:
In [281]:
df_reshape_test = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 'dist_to_A' : [0, 0, 0, 50, 50, 50], 'dist_to_B' : [50, 50, 50, 0, 0, 0], 'location_var': [10, 10, 10, 14, 14, 14], 'ind_var': [3, 8, 10, 1, 3, 4]})
df_reshape_test
Out[281]:
dist_to_A dist_to_B ind_var location location_var
0 0 50 3 A 10
1 0 50 8 A 10
2 0 50 10 A 10
3 50 0 1 B 14
4 50 0 3 B 14
5 50 0 4 B 14
The variable 'location' is the one chosen by the individual. dist_to_A is the distance to location A from the location chosen by the individual (same thing with dist_to_B)
I'd like my data to have this form:
choice dist_S ind_var location location_var
0 1 0 3 A 10
0 0 50 3 B 14
1 1 0 8 A 10
1 0 50 8 B 14
2 1 0 10 A 10
2 0 50 10 B 14
3 0 50 1 A 10
3 1 0 1 B 14
4 0 50 3 A 10
4 1 0 3 B 14
5 0 50 4 A 10
5 1 0 4 B 14
where choice == 1 indicates individual has chosen that location and dist_S is the distance from the location chosen.
I read about the .stack method but couldn't figure out how to apply it for this case. Thanks for your time!
NOTE: this is just a simple example. The datasets I'm looking have varying numbers of location and number of individuals per location, so I'm looking for a flexible solution if possible
In fact, pandas has a
wide_to_long
command that can conveniently do what you intend to do.This gives you the table you asked for:
Of course you can generate a choice variable that takes
0
and1
if that's what you want.Ok, this took longer that I expected, but here's a more general answer that works with an arbitrary number of choices per individual. I'm sure there are simpler ways, so it would be great if somebody can chime in with something better for some of the following code.
which gives
Then we do:
Now I create a Multi-Index and do a re-index to get a long shape
The long shape looks like this:
So now fill the NaN values with the dictionaries:
Finally, all that is left is creating dist_S
I'll cheat here and assume I can create a nested dictionary like this one
(This reads: if you're in location A, then location A is at 0 km and location B at 50 km)
gives the desired result
I'm a bit curious why you'd like it in the format. There's probably a much better way to store your data. But here goes.
It's not going to generalize well, but there are probably alternative (better) ways to get around the uglier parts like generating the choice col.