We have a news feed, and we want to surface items to the user based on a number of criteria. Certain items will be surfaced because of factor A, another because of factor B, and yet another because of factor C. We can create individual heuristics for each factor, but we then need to combine these heuristics in such a way that it promotes the best content considering each factor while still giving a mix of content from each factor.
Our naive approach is to load the top n
from each factor, take the first of each, and make those the first 3 of the feed. Then take the 2nd from each feed and make that the second 3, and so on and so forth. Ideally, we would have some algorithm for more intelligently ranking these feed items - our first thought was to simply sum the three heuristics and pull the top items using the resulting combined score, but there are no guarantees that the heuristics are evenly-scaled (or are evenly-scaled for that particular user), which could result in one factor dominating over the others in the feed. Is there some more intelligent way of ranking these news feed items (akin to what Facebook does in its pseudo-chronological news feed)?
If your final combined heuristic does not need to be admissible, it can do no harm to use a sum of the original heuristics as your final heuristic. The problem here is that the original heuristics are probably not of the same dimension, for instance A has values ranging from 0 to 100 and B has values from -1 to +1. I suggest using following formula to calculate the combined heuristic for an item, that ignores the dimensions of the particular heuristics:
H = (A - min(A))/(max(A) - min(A)) + (B - min(B))/(max(B) - min(B)) + (C - min(C))/(max(C) - min(C))
Of course, to find the
min
andmax
values for each heuristic, you need understanding of the meaning of each individual heuristic. I am not sure this solves your problem, but i hope it does.You have many categories. Let's say A, B and C.
Combining everything together and rank it(You mentioned that we would have some algorithm for more intelligently ranking these feed items) without depending on the category.
Show the first 4-5 items in the ranked list independent of category.
If you have Sponsored feed items(like facebook), then show the top ranked sponsored feed item( If the ranks are 16,27,39,etc then show 16 after the 5) and likewise.
Then enter to the category.
If the user have the ability to subscribe to category, then show posts based on the categories.
For example
A have 10 items say a1...a10
B have 10 items say b1...b10
likewise C have 10 items say c1...c10
If the user opted for mainly category B, then show top ranked in b, then 6th ranked in list, second top ranked in b, from list, etc.
After 10-12 items,
Show the items from each category based on the rank order.
If the user didn't opted for a specific category, then the ranking order should be maintained to 8-10 items and then chose from each category based on the rank order.
Side tip
When implementing new algorithm, it will always be helpful if you collect the feedback from user from their experience.
The user should get their preferred contents first, then the contents that are top in each category.
For that, always refer the user activities and browsing history of each category and each type of post.
I want to add to the point made by Arne Van Den Kerchove - Normalization.
I would suggest another layer that:
Defines the new Heuristic direction:
If optimal A,B,C differ in their direction, e.g. optimal A is low, but optimal B is high. This heuristic is the positive square root of the squares of the normalized factors, so higher is better.
Will allow to incorporate user response based on the amount of attention (weight) the user assigns to each metrics.
Here is how I imagine it:
Alpha, beta and gamma are weights and will start as [1,1,1] unless you have knowledge that one of the metrics is preferred.
These weights shall change with each user response.
For example:
If a user chooses something that ranks as follows:
You can add a fraction of the difference between the relative values to alpha, beta and gamma respectively. This way you will have a dynamic rating that not only calculates the factors as you already do, but also adjusts to what the user cares about.
For the example above, if we add the full difference, the new alpha, beta and gamma will be [1.0322,0.9456,1.0222] respectively.
(Subtract the average (0.1778) from the relative values [0.21,0.1234,0.2] and add the result to the initial set [1,1,1])
This way the new relevant item set will be dictated by the user's cumulative choices.
I am not very sure about Facebook, but I have seen something that Netflix did and If you have enough labeled data, (user history of response to your heuristic ranking) you can try it. It's using Matrix Factorization with a special loss function to get ranks and they did achieve very good results ! The link to the presentation.
If this seems so complicated (and in a way it's), and you have enough data to do MF, I suggest that you try it and interpret the inferred number as you guide for ranking. Actually what you will be predicting is the affinity of your user to each of your user news feed, so the higher the affinity, the higher the rank and vice versa.