Within the apriori function, I want the outcome to only contain these two variables in the LHS HouseOwnerFlag=0
and HouseOwnerFlag=1
. The RHS should only contain attributes from the column Product
. For instance:
# lhs rhs support confidence lift
# 1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black} 0.2500000 0.2500000 1.000000
# 2 {HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p} 0.2500000 0.2500000 1.000000
# 3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333
Part of the answer is solved in this question: R arules, mine only rules from specific column
So now I use the following:
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))
Then I use this from that other SO question to ensure that only the Product column is on the RHS:
inspect( subset( rules, subset = rhs %pin% "Product=" ) )
The outcome is like this:
# lhs rhs support confidence lift
# 1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works } => {Product=SV 16xDVD M360 Black} 0.2500000 0.2500000 1.000000
# 2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => {Product=Adventure Works 26" 720p} 0.2500000 0.2500000 1.000000
# 3 {BrandName=Southridge Video, NumberChildrenAtHome=0 } => {Product=Litware Wall Lamp E3015 Silver} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 } => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333
So apparently the LHS is able to contain every possible column, not just HouseOwnerFlag
like I specified. From other stackoverflow questions, I see that I can put default="rhs"
in the apriori function, like so:
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), default="rhs"))
Then upon inspecting (without the subset part, just inspect(rules
), there are far less rules (7) than before but it does indeed only contain HouseOwnerFlag
in the LHS:
# lhs rhs support confidence lift
# 1 {HouseOwnerFlag=0} => {MaritalStatus=S} 0.2500000 0.2500000 1.000000
# 2 {HouseOwnerFlag=1} => {Gender=M} 0.2500000 0.2500000 1.000000
# 3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0} 0.1666667 0.3333333 1.333333
# 4 {HouseOwnerFlag=1} => {Gender=M} 0.1666667 0.3333333 1.333333
However on the RHS there's nothing from the column Product in the RHS. So it has no use to inspect
it with subset
as ofcourse it would return null. I tested it several times with different support numbers to experiment and see if Product would appear or not, but the 7 same rules remain the same.
So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS (Product)? What am I doing wrong?
EDIT: You can reproduce this problem by downloading this testdataset from https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0
Mind you, I only took the first 20 rows from a huge dataset, so the output here won't have the same product names as the example I displayed above unfortunately. But the problem still remains the same. I want to be able to get only HouseOwnerFlag=0
and/or HouseOwnerFlag=1
on the LHS and the column Product
on the RHS.