I am analysing the distances of users to userx
over 6 weeks in a social network.
Note: 'No path' means the two users are not conncted yet (at least by friends of friends).
week1 week2 week3 week4 week5 week6
user1 No path No path No path No path 3 1
user2 No path No path No path 5 3 1
user3 5 4 4 4 4 3
userN ...
I want to see how well the users connect with userx
.
For that I initially thought of using the value of regression slope for the interpretation (i.e. the low regression slope, the better it is).
For example; consider user1
and user2
the regression slope of them are calculated as follows.
user1:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[5], [6]] #distance available only for week5 and week6
y = [3, 1]
regressor.fit(X, y)
print(regressor.coef_)
Output is -2.
user2:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X = [[4], [5], [6]] #distance available only for week4, week5 and week6
y = [5, 3, 1]
regressor.fit(X, y)
print(regressor.coef_)
Output is -2.
As you can see both the users get same slope
value. However, user2
has been connected with userx
a week before than user1
. Hence, user1
should be awarded someway.
Therefore, I am wondering if there is a better way of calculating my problem.
I am happy to provide more details if needed.