-->

how to convert data.frame to transactions for arul

2019-01-10 17:39发布

问题:

I read data from a csv file, the data has 3 columns, one is transaction id, the other two are product and product catagory. I need to convert this into transactions in order to use the apriori function in arules. It shows an error when I convert to transactions:

dat <- read.csv("spss.csv",head=TRUE,sep="," , as.is = T)
dat[,2] <- factor(dat[,2])
dat[,3] <- factor(dat[,3])
spssdat <- dat[,c(1,2,3)]
str(spssdat)

'data.frame':   108919 obs. of  3 variables:
 $ Transaction_id: int  3000312 3000312 3001972 3003361 3003361 3003361 3003361 3003361 3003361 3004637 ...
 $ product_catalog : Factor w/ 9 levels "AIM","BA","IM",..: 1 1 5 7 7 7 7 7 7 1 ...
 $ product      : Factor w/ 332 levels "ACM","ACTG/AIM",..: 7 7 159 61 61 61 61 61 61 7 ...

trans4 <- as(spssdat, "transactions")

Error in as(spssdat, "transactions") : 
  no method or default for coercing “data.frame” to “transactions”

If the data only have two columns, it can work by:

trans4 <- as(split(spssdat[,2], spssdat[,1]), "transactions")

But I don't know how to convert when I have 3 columns. Usually there are the additional columns likes category attributes, customer attributes. so the column usually large than 2 columns. need to find rules between multiple columns.

回答1:

I have found some information that worked for me on this website. Let me copy relevant paragraph:

The dataframe can be in either a normalized (single) form or a flat file (basket) form.
When the file is in basket form it means that each record represents a transaction where the items in the basket are represented by columns.
When the dataset is in single form it means that each record represents one single item and each item contains a transaction id.

To load transactions from file, use read.transactions. In both your and my case file is in the single form.
I've used following code to load .csv file as transactions:

trans = read.transactions("some_data.csv", format = "single", sep = ",", cols = c("transactionID", "productID"))

To fully understand above command, take a look at read.transactions manual, available after typing ?read.transactions in R console.



回答2:

I was attempting to do the same thing and after I factored all my columns in the data.frame I was working with, I still could not coerce it into an itemMatrix of transactions. Then I realized I never re-loaded the "arules" package for the session I was working in. Very stupid mistake, but just wanted to mention it in case anyone else runs into the same problem, try the simple stuff first:

library("arules")


回答3:

You need to first convert "Transaction_id" into a factor variable.