So this problem we have users matching to other online users. However it is not just a one to one match. A user is given a selection of 5 other users to choose from, which are then marked as seen and should not be shown again when the user requests for another 5 users to be shown. More people can come online during the process.
The problem is, I want a way for each user to be shown in the selection for other users, with redis but an algorithm is mostly what im looking for. I'm trying to implement this in the fastest way possible, using redis if possible but I can also make calls to the database if it's needed.
My current solution is as follows, hopefully someone will have some tips to improve this from O(N) calls.
So each user needs to have a seen set of user_id
s. We can have a redis list (queue) of onlineusers
. Where we keep poppping users from the left until we find one that isn't in the user's seen set, save it, add to users seen, then push it on the right. Then once we get 5 of those we left push back the ones we left popped off that were already seen.
This is the best I could think of however it is O(N) each time we want to find 5 users for this one user to select from. It's possible (though not likely) that the user has seen a huge amount and is popping off the whole list.
To help understand this better. A naiive approach is to have every single user contain a copy of all online users in the form of a set. So then we simply pop 5 random set members. But this can't work because theres not enough space, and each time a user goes online they'd have to be added to each user's online users. Or deleted when they go offline and those operations are O(N) considering they are done for N users at O(1)
Does anyone have any tips to match users with other users?
It would be good to know about which kind of data we are talking about. How many users exist? How many will be online at average? How is the ratio of "seen users" compared to all users (sparse vs. dense)?
Modification of your algorithm Don't pop the first but choose a random element from the set of online users. This should improve balancing and may help with amortized complexity depending on the ratio of these two sets!
Alternative Algorithm (more structured; still bad worst-case; should be good if sparse seen)
Depending on the data, this should work very good if data is huge and seen is sparse!