I'm looking for an good way to solve the following problem. My current fix is not particularly clean, and I'm hoping to learn from your insight.
Suppose I have a Panda DataFrame, whose entries look like this:
>>> df=pd.DataFrame(index=[1,2,3],columns=['Color','Texture','IsGlass'])
>>> df['Color']=[np.nan,['Red','Blue'],['Blue', 'Green', 'Purple']]
>>> df['Texture']=[['Rough'],np.nan,['Silky', 'Shiny', 'Fuzzy']]
>>> df['IsGlass']=[1,0,1]
>>> df
Color Texture IsGlass
1 NaN ['Rough'] 1
2 ['Red', 'Blue'] NaN 0
3 ['Blue', 'Green', 'Purple'] ['Silky','Shiny','Fuzzy'] 1
So each observation in the index corresponds to something I measured about its color, texture, and whether it's glass or not. What I'd like to do is turn this into a new "indicator" DataFrame, by creating a column for each observed value, and changing the corresponding entry to a one if I observed it, and NaN if I have no information.
>>> df
Red Blue Green Purple Rough Silky Shiny Fuzzy Is Glass
1 Nan Nan Nan Nan 1 NaN Nan Nan 1
2 1 1 Nan Nan Nan Nan Nan Nan 0
3 Nan 1 1 1 Nan 1 1 1 1
I have solution that loops over each column, looks at its values, and through a series of Try/Excepts for non-Nan values splits the lists, creates a new column, etc., and concatenates.
This is my first post to StackOverflow - I hope this post conforms to the posting guidelines. Thanks.