How to get a normalised slope of a trend

2020-05-01 08:41发布

问题:

I am analysing the distances of users to userx over 6 weeks in a social network.

Note: 'No path' means the two users are not conncted yet (at least by friends of friends).

              week1      week2    week3    week4    week5   week6
user1        No path    No path  No path   No path   3       1
user2        No path    No path  No path     5       3       1
user3         5          4         4         4       4       3
userN         ...

I want to see how well the users connect with userx.

For that I initially thought of using the value of regression slope for the interpretation (i.e. the low regression slope, the better it is).

For example; consider user1 and user2 the regression slope of them are calculated as follows.

user1:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[5], [6]] #distance available only for week5 and week6
y = [3, 1]
regressor.fit(X, y)
print(regressor.coef_)

Output is -2.

user2:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[4], [5], [6]] #distance available only for week4, week5 and week6
y = [5, 3, 1]
regressor.fit(X, y)
print(regressor.coef_)

Output is -2.

As you can see both the users get same slope value. However, user2 has been connected with userx a week before than user1. Hence, user1 should be awarded someway.

Therefore, I am wondering if there is a better way of calculating my problem.

I am happy to provide more details if needed.

回答1:

Well, if you want to award for the duration of connection, you probably need to take time into calculations. The easiest/most straightforward way is just to multiply the coefficent by time:

outcome_measure <- regressor.coef_ * length(y)

And if you would divide it by 2 it will conceptually be the same as the area under the curve (AUC):

outcome_measure <- (regressor.coef_ * length(y))/2

So you would get -4 and -6 with the first method or -2 and -3 with the second.

Slightly offtopic, but IF you use linear regression for statistical analysis (not just to get coefficent), I probably would add some kind of check to confirm that its assumptions are true.