-->

Adding item information to transaction object in a

2019-05-31 00:49发布

问题:

I am using the arules package to find association rules in point-of-sale retail data. I am extracting transaction detail from a database, then placing in a transaction object. I'm new to arules and am trying to figure out how to populate the itemInfo data frame in the transaction object. Right now, I'm just bringing in the transaction and item IDs (both numeric), which provide little context. I would like to be able to add an item description, as well as product hierarchy levels.

Below is the process I'm using today:

  1. Data comes through from the database in the below format:

    Transaction_ID     Item_ID
    --------------     ----------- 
    100                1
    100                2
    100                3
    101                2
    101                3
    102                1
    102                2
    
  2. To create the transaction object, I'm using the below command, as described in the arules documentation:

    txdata <- as(split(txdata[, "Item_ID"], txdata[, "Transaction_ID"]), "transactions")
    

    Note: I've found that I need to have a numeric value for the Item_ID, otherwise I run into major performance issues using a string (due to poor performance of split when using factored strings).

  3. Create and view the association rules

    rules <- apriori(txdata, parameter = list(support=0.00015, confidence=0.5))
    inspect(head((sort(rules, by="confidence")), n=5))
    

When the rules come back, they are listed by Item_ID, which is not helpful to me. I want to be able to display them by the ID and/or description. Also, would like to take advantage of the aggregation features built into the arules package.

回答1:

You can change the names of items using itemInfo. Here is an example:

R> df <- data.frame(
   TID = c(1,1,2,2,2,3), 
   item=c("a","b","a","b","c", "b")
 )
R> trans <- as(split(df[,"item"], df[,"TID"]), "transactions")

### this is how you replace item labels and set a hierachy (here level1)
R> myLabels <- c("milk", "butter", "beer")
R> myLevel1 <- c("dairy", "dairy", "beverage")
R> itemInfo(trans) <- data.frame(labels = myLabels, level1 = myLevel1)

R> inspect(trans)
     items    transactionID
  1 {milk,                
     butter}             1
  2 {milk,                
     butter,              
     beer}               2
  3 {butter}             3

 ### now you can use aggregate()
 R> inspect(aggregate(trans, itemInfo(trans)[["level1"]]))
     items      transactionID
  1 {dairy}                1
  2 {beverage,              
     dairy}                2
  3 {dairy}                3

You can find more info using class? transactions and ? aggregate.

Hope this helps, Michael



标签: r arules