Here is a Matlab coding problem (A little different version with intersect not setdiff here:
a rating matrix A with 3 cols, the 1st col is user'ID which maybe duplicated, 2nd col is the item'ID which maybe duplicated, 3rd col is rating from user to item, ranging from 1 to 5.
Now, I have a subset of user IDs smallUserIDList and a subset of item IDs smallItemIDList, then I want to find the rows in A that rated by users in smallUserIDList, and collect the items that user rated, and do some calculations, such as setdiff with smallItemIDList and count the result, as the following code does:
userStat = zeros(length(smallUserIDList), 1);
for i = 1:length(smallUserIDList)
A2= A(A(:,1) == smallUserIDList(i), :);
itemIDList_each = unique(A2(:,2));
setDiff = setdiff(itemIDList_each , smallItemIDList);
userStat(i) = length(setDiff);
end
userStat
Finally, I find the profile viewer showing that the loop above is inefficient, the question is how to improve this piece of code with vectorization but the help of for loop?
For example:
Input:
A = [
1 11 1
2 22 2
2 66 4
4 44 5
6 66 5
7 11 5
7 77 5
8 11 2
8 22 3
8 44 3
8 66 4
8 77 5
]
smallUserIDList = [1 2 7 8]
smallItemIDList = [11 22 33 55 77]
Output:
userStat =
0
1
0
2
This could be one
vectorized
approach -Benchmarking
The code listed next compares runtimes for proposed approach against the original loopy code -
The runtimes thus obtained with three sets of datasizes were -
Case #1:
Case #2:
Case #3:
Conclusion: The speedups with the proposed approach over the original loopy code thus seem to be huge!!
Vanilla MATLAB:
As far as I can tell your code is equivalent to:
This will work if there is at most one rating per
(user,item)
-combination. Also it should be quite efficient.Clean approach without reinventing the wheel:
Check out
grpstats
from the Statistics Toolbox! An implementation could look similar to this:I think you are trying to remove a fixed set of ratings for a subset of users and count the number of remaining ratings:
Does the following work:
you need the allcomb function from the file exchange from matlab central, it gives a cartesian product of two vectors, and is easy to implement anyway.