i'm looking for help creating a sub-dataframe from an existing dataframe using a np.nansum-like function. I want to convert this table into a matrix of non-null column sums:
dan ste bob
t1 na 2 na
t2 2 na 1
t3 2 1 na
t4 1 na 2
t5 na 1 2
t6 2 1 na
t7 1 na 2
For example, when 'dan' is not-null (t-2,3,4,6,7) the sum of 'ste' is 2 and 'bob' is 5. When 'ste' is not-null the sum of 'dan' is 4.
dan ste bob
dan 0 2 5
ste 4 0 2
bob 4 1 0
Any ideas?
Thanks in advance!
I ended up using a modified version of matt's function below:
def nansum_matrix_create(df):
rows = []
for col in list(df.columns.values):
col_sums = df[df[col] != 0].sum()
rows.append(col_sums)
return pd.DataFrame(rows, columns=df.columns, index=df.columns)
pd.DataFrame.notnull
to get where non-nulls are.pd.DataFrame.dot
to ge the crosstab.np.eye
to zero out the diagonal.Note:
I used this to ensure my values were numeric.
Assuming your dataframe doesn't have large number of columns, this function should do what you want and be fairly performant. I have implemented this using
for
loop across columns so there may be a more performant / elegant solution out there.