I have users and resources. Each resource is described by a set of features and each user is related to a different set of resources. In my particular case, the resources are web pages, and the features information about the location of the visit, the time of the visit, the number of visit etc, which are tied to a specific user each time.
I want to get a similarity measure between my users regarding those features but I can't find a way to aggregate the resource features together. I've done it with text features, as it is possible to add the documents together and then extract features (say TF-IDF), but I don't know how to proceed with this configuration.
To be as clear as possible, here is what I have:
>>> len(user_features)
13 # that's my number of users
>>> user_features[0].shape
(2374, 17) # 2374 documents for this user, and 17 features
I'm able to get a similarity matrix of the documents using euclidean distances for instance:
>>> euclidean_distance(user_features[0], user_features[0])
But I don't know how do I compare the users against each other. I should somehow aggregate the features together to end up with a N_Users X N_Features
matrix, but I don't know how.
Any hints on how to proceed?
Some more information about the features I'm using:
The features I have here are not completely fixed. What I've got so far is 13 different features, already aggregated from "views". What I have is standard deviation, mean, etc. for each of the views, in order to have something "flat", to be able to compare them. One of the feature I have is: was the location changed since the last view? And what about one hour ago? Two hours ago?