transition matrix for counts and proportions pytho

2019-08-20 19:08发布

I have a matrix with the grades from a class for different years(rows for years and columns for grades). What I want is to build a transition matrix with the change between years.

For instance, I want year t-1 on the y-axis and year t on the x-axis and then I want a transition matrix with the difference in the number of people with grade A between year t-1 and t, grade B between year t-1 and t, and so on. And then a second transition matrix with the proportions, for example: - Between year t-1 and t there z% more/less people with grade A/B/C/D/F.

Obviously the moest import part is the diagonal which would represent the change for the same grade for different years.

I want this to be done in Python.

Thank you very much, I hope everything is clear.

Result example: enter image description here

1条回答
小情绪 Triste *
2楼-- · 2019-08-20 19:44

You can use pandas library with df.diff. numpy can generate the matrix of all possible differences using np.subtract.outer. below is an example.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
years = ['2015', '2016', '2017']
grades = ['A', 'B', 'C', 'D']

df = pd.DataFrame(np.random.randint(0, 10, (3, 4)), columns=grades, index=years)

print(df)

      A  B  C  D
2015  5  0  2  0
2016  7  2  0  2
2017  3  7  6  7

df_diff = df.diff(axis=0)
print(df_diff)

each row here in df_diff is the difference between current row and the preceding one from original df

        A        B     C     D
2015    NaN     NaN   NaN   NaN
2016    2.0     2.0   -2.0  2.0
2017    -4.0    5.0   6.0   5.0

a = np.array([])
differences = []
for i, y in enumerate(years):
    for j, g in enumerate(grades):
        differences.append(y+g)
        a = np.append(a, df.iloc[i,j])

df3 = pd.DataFrame(np.subtract.outer(a, a), columns=differences, index=differences)
print(df3)

      2015A   2015B  2015C  2015D   2016A   2016B   2016C   2016D   2017A   2017B   2017C   2017D
2015A   0.0     5.0  3.0    5.0 -2.0    3.0     5.0 3.0      2.0    -2.0    -1.0    -2.0
2015B   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0    -3.0    -7.0    -6.0    -7.0
2015C   -3.0    2.0  0.0    2.0 -5.0    0.0     2.0 0.0     -1.0    -5.0    -4.0    -5.0
2015D   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0    -3.0    -7.0    -6.0    -7.0
2016A   2.0     7.0 5.0     7.0  0.0    5.0     7.0  5.0    4.0     0.0   1.0       0.0
2016B   -3.0    2.0 0.0     2.0 -5.0    0.0     2.0 0.0    -1.0    -5.0  -4.0   -5.0
2016C   -5.0    0.0 -2.0    0.0 -7.0    -2.0    0.0 -2.0   -3.0    -7.0  -6.0   -7.0
2016D   -3.0    2.0 0.0     2.0 -5.0    0.0     2.0 0.0    -1.0     -5.0    -4.0    -5.0
2017A   -2.0    3.0 1.0     3.0 -4.0    1.0     3.0 1.0     0.0    -4.0  -3.0   -4.0
2017B   2.0     7.0 5.0     7.0 0.0     5.0     7.0 5.0     4.0     0.0     1.0     0.0
2017C   1.0     6.0 4.0     6.0 -1.0    4.0     6.0 4.0     3.0    -1.0   0.0     -1.0
2017D   2.0     7.0 5.0     7.0 0.0     5.0     7.0 5.0     4.0     0.0   1.0 0.0

plot this matrix using matshow from matplotlib

plt.matshow(df3)
plt.colorbar()
plt.show()

enter image description here

查看更多
登录 后发表回答