-->

Apriori Algorithm- frequent item set generation

2019-09-10 03:51发布

问题:

I am using Apriori algorithm to identify the frequent item sets of the customer.Based on the identified frequent item sets I want to prompt suggest items to customer when customer adds a new item to his shopping list, As the frequent item sets I got the result as follows;

[1],[3],[2],[5]
[2.3],[3,5],[1,3],[2,5]
[2,3,5]

My problem is if I consider only [2,3,5] set to make suggestions to customer am I wrong? i.e If customer adds item 3 to his shopping list I would recommend item 2 and item 5. If customer adds item 1 to the shopping list no suggestions will be made since I am considering only set [2,3,5] and item 1 is not available in that set. I want to know whether my logic (considering only set [2,3,5]) is enough to make suggestions for the user

回答1:

No. Deriving recommendation rules requires more effort.

Just because [2,3,5] is frequent does not mean 2 -> 3,5 is a good rule.

Consider the case that 2 is a very popular product, but 3,5 are just barely frequent. Consider a gas station. [gas, coffee, bagel] is probably a frequent itemset, but rather few customers who buy gas will also buy coffee and a bagel (low confidence).

You do want to consider rules such as 2,3 -> 5 because they may have higher confidence. I.e. if the customer buys gas and coffee, suggest a bagel.

Frequency is not sufficient for recommendations! Consider 2 and 3 are bought in 80% of cases. 2, 3, 5 is bought in 60% of cases. Naively, in 6 out of 8 times, the customer will also buy 5, that's 75% correct! But this does not mean 5 is a good recommendation! Because 5 could be in 80% total, so if he bought 2 and 3, he is actually 5% less likely to buy 5, and we have a negative correlation here. That's why you need to look at lift, too. Or other measures like it, there are many.



回答2:

You should base on how the frequency of the item set is relative to its sub item sets to figure out the rule. For example

  1. if frequency of (2,3,5) is close to the frequency of (3,5), the rule will be (3,5) -> 2
  2. If frequency of (2,3,5) is close to the frequency of (3), the rule will be 3 -> (2,5)
  3. If frequency of (2,3) is close to the frequency of (2), the rule will be 2 -> 3

That means not only largest frequent item set could be used to make rule but its sub frequent item sets also. And the rule will be more pricise if you could consider how close frequency of item sets is relative to others.