I am trying to use mahout for the recommendation but getting none.
My dataset :
0,102,5.0
1,101,5.0
1,102,5.0
Code :
DataModel datamodel = new FileDataModel(new File("dataset.csv"));
// Creating UserSimilarity object.
UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel);
// Creating UserNeighbourHHood object.
UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel);
// Create UserRecomender
UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity);
List<RecommendedItem> recommendations = recommender.recommend(0, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
I am using Mahout version : 0.13.0
Ideally, it should recommend item_id = 101' to 'user_id = 0' as
user = 0and
user = 1have item 102 common show it should recommend
item_id = 101to
user_id = 0`
Logs :
18:08:11.669 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file dataset.csv
18:08:11.700 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...
18:08:11.702 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 3
18:08:11.722 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 2 users
18:08:11.738 [main] DEBUG org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender - Recommending items for user ID '0'
The Hadoop Mapreduce code in Mahout is being deprecated. The new recommender code starts with @rawkintrevo 's examples. If you are a Scala programmer follow them.
Most Engineers would like a system that works with no modification, The Mahout algorithm is encapsulated in The Universal Recommender built on top of Apache PredictionIO. It has a server to accept events, like the ones in your example, it has internal event storage, and a query server for results. There are numerous improvements over the old Mapreduce code, including using real-time user behavior to make recommendations. Neither the new Mahout nor the old included servers for input and query, the Universal Recommender has REST endpoints for both.
Given that the code you are using will be deprecated I strongly suggest that you dive into Mahout code (@rawkintrevo's example) or look at The Universal Recommender, which is an entire end-to-end system.
- Install PredictionIO with a "single machine" setup here or to really shortcut setup use our prepackaged AWS AMI here It includes PIO and The Universal Recommender pre-installed.
- Add the UR Template here
- A Java SDK for sending events to the recommender here
Once you have this setup you deal with config, REST or Java SDK and the PIO CLI. No Scala coding required.
I have three examples that are based on version 0.13.0 (and Scala, which is required for Samsara, the R-Like Scala DSL Mahout utilizes v0.10+)
Walk
The first example is a very slow walk through:
https://gist.github.com/rawkintrevo/3869030ff1a731d43c5e77979a5bf4a8
and is meant as a companion to Pat Ferrels blog post/slide deck found here.
http://actionml.com/blog/cco
Crawl
The second example is a little more "real" in that it utilizes the SimilarityAnalysis.cooccurrencesIDSs(...
which is the propper interface for the CCO algorithm.
https://gist.github.com/rawkintrevo/c1bb00896263bdc067ddcd8299f4794c
Run
Here we use 'real' data. The MovieLens data set doesn't have enough going on to showcase CCO's multi-modal power (the ability to recommend on multiple user behaviors). Here we load 'real' data and generate recommendations.
https://gist.github.com/rawkintrevo/f87cc89f4d337d7ffea80a6af3bee83e
Conclusion
I know you specifically asked for Java, however Apache Mahout isn't geared for Java at the moment. In theory you could import Scala into your java, or maybe wrap the functions in another more Java friendly function... I've heard rumors late at night (or possibly in a dream) that some grad students some where were working on a Java API, but its not in the trunk at the moment, nor is there a PR, nor is their a bullet in the road map.
Hope the above provides some insight.
Appendix
The most trivial example for Stackoverflow (you can run this interactively in the Mahout spark shell by typing $MAHOUT_HOME/bin/mahout spark-shell
(assuming SPARK_HOME
, JAVA_HOME
and MAHOUT_HOME
are set):
val inputRDD = sc.parallelize(Array( ("u1", "purchase", "iphone"),
("u1","purchase","ipad"),
("u2","purchase","nexus"),
("u2","purchase","galaxy"),
("u3","purchase","surface"),
("u4","purchase","iphone"),
("u4","purchase","galaxy"),
("u1","category-browse","phones"),
("u1","category-browse","electronics"),
("u1","category-browse","service"),
("u2","category-browse","accessories"),
("u2","category-browse","tablets"),
("u3","category-browse","accessories"),
("u3","category-browse","service"),
("u4","category-browse","phones"),
("u4","category-browse","tablets")) )
import org.apache.mahout.math.indexeddataset.{IndexedDataset, BiDictionary}
import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark
val purchasesIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "purchase").map(o => (o._1, o._3)))(sc)
val browseIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "category-browse").map(o => (o._1, o._3)))(sc)
import org.apache.mahout.math.cf.SimilarityAnalysis
val llrDrmList = SimilarityAnalysis.cooccurrencesIDSs(Array(purchasesIDS, browseIDS),
randomSeed = 1234,
maxInterestingItemsPerThing = 3,
maxNumInteractions = 4)
val llrAtA = llrDrmList(0).matrix.collect
IndexedDatasetSpark.apply(
requires an RDD[(String, String)]
where the first string is the 'row' (e.g. users), second string is the 'behavior' so for the 'buy matrix', the columns would be 'products', but this could also be a 'gender' matrix, with two columns (male/female)
Then you pass an array of IndexedDataSets to SimilarityAnalysis.cooccurrencesIDSs(