Apache Mahout not giving any recommendation

2019-08-08 20:54发布

问题:

I am trying to use mahout for the recommendation but getting none.

My dataset :

0,102,5.0
1,101,5.0
1,102,5.0

Code :

      DataModel datamodel = new FileDataModel(new File("dataset.csv"));
        // Creating UserSimilarity object.
        UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel);

        // Creating UserNeighbourHHood object.
        UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel);

        // Create UserRecomender
        UserBasedRecommender recommender = new GenericUserBasedRecommender(datamodel, userneighborhood, usersimilarity);

        List<RecommendedItem> recommendations = recommender.recommend(0, 1);

        for (RecommendedItem recommendation : recommendations) {
            System.out.println(recommendation);
        }

I am using Mahout version : 0.13.0

Ideally, it should recommend item_id = 101' to 'user_id = 0' asuser = 0anduser = 1have item 102 common show it should recommenditem_id = 101touser_id = 0`

Logs :

18:08:11.669 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel for file dataset.csv
18:08:11.700 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...
18:08:11.702 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 3
18:08:11.722 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 2 users
18:08:11.738 [main] DEBUG org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender - Recommending items for user ID '0'

回答1:

The Hadoop Mapreduce code in Mahout is being deprecated. The new recommender code starts with @rawkintrevo 's examples. If you are a Scala programmer follow them.

Most Engineers would like a system that works with no modification, The Mahout algorithm is encapsulated in The Universal Recommender built on top of Apache PredictionIO. It has a server to accept events, like the ones in your example, it has internal event storage, and a query server for results. There are numerous improvements over the old Mapreduce code, including using real-time user behavior to make recommendations. Neither the new Mahout nor the old included servers for input and query, the Universal Recommender has REST endpoints for both.

Given that the code you are using will be deprecated I strongly suggest that you dive into Mahout code (@rawkintrevo's example) or look at The Universal Recommender, which is an entire end-to-end system.

  • Install PredictionIO with a "single machine" setup here or to really shortcut setup use our prepackaged AWS AMI here It includes PIO and The Universal Recommender pre-installed.
  • Add the UR Template here
  • A Java SDK for sending events to the recommender here

Once you have this setup you deal with config, REST or Java SDK and the PIO CLI. No Scala coding required.



回答2:

I have three examples that are based on version 0.13.0 (and Scala, which is required for Samsara, the R-Like Scala DSL Mahout utilizes v0.10+)

Walk

The first example is a very slow walk through: https://gist.github.com/rawkintrevo/3869030ff1a731d43c5e77979a5bf4a8 and is meant as a companion to Pat Ferrels blog post/slide deck found here. http://actionml.com/blog/cco

Crawl

The second example is a little more "real" in that it utilizes the SimilarityAnalysis.cooccurrencesIDSs(... which is the propper interface for the CCO algorithm.

https://gist.github.com/rawkintrevo/c1bb00896263bdc067ddcd8299f4794c

Run

Here we use 'real' data. The MovieLens data set doesn't have enough going on to showcase CCO's multi-modal power (the ability to recommend on multiple user behaviors). Here we load 'real' data and generate recommendations. https://gist.github.com/rawkintrevo/f87cc89f4d337d7ffea80a6af3bee83e

Conclusion I know you specifically asked for Java, however Apache Mahout isn't geared for Java at the moment. In theory you could import Scala into your java, or maybe wrap the functions in another more Java friendly function... I've heard rumors late at night (or possibly in a dream) that some grad students some where were working on a Java API, but its not in the trunk at the moment, nor is there a PR, nor is their a bullet in the road map.

Hope the above provides some insight.

Appendix

The most trivial example for Stackoverflow (you can run this interactively in the Mahout spark shell by typing $MAHOUT_HOME/bin/mahout spark-shell (assuming SPARK_HOME, JAVA_HOME and MAHOUT_HOME are set):

val inputRDD = sc.parallelize(Array(  ("u1", "purchase",    "iphone"),
  ("u1","purchase","ipad"),
  ("u2","purchase","nexus"),
  ("u2","purchase","galaxy"),
  ("u3","purchase","surface"),
  ("u4","purchase","iphone"),
  ("u4","purchase","galaxy"),
  ("u1","category-browse","phones"),
  ("u1","category-browse","electronics"),
  ("u1","category-browse","service"),
  ("u2","category-browse","accessories"),
  ("u2","category-browse","tablets"),
  ("u3","category-browse","accessories"),
  ("u3","category-browse","service"),
  ("u4","category-browse","phones"),
  ("u4","category-browse","tablets")) )



import org.apache.mahout.math.indexeddataset.{IndexedDataset, BiDictionary}
import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark

val purchasesIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "purchase").map(o => (o._1, o._3)))(sc)
val browseIDS = IndexedDatasetSpark.apply(inputRDD.filter(_._2 == "category-browse").map(o => (o._1, o._3)))(sc)

import org.apache.mahout.math.cf.SimilarityAnalysis

val llrDrmList = SimilarityAnalysis.cooccurrencesIDSs(Array(purchasesIDS, browseIDS),
  randomSeed = 1234,
  maxInterestingItemsPerThing = 3,
  maxNumInteractions = 4)

val llrAtA = llrDrmList(0).matrix.collect

IndexedDatasetSpark.apply( requires an RDD[(String, String)] where the first string is the 'row' (e.g. users), second string is the 'behavior' so for the 'buy matrix', the columns would be 'products', but this could also be a 'gender' matrix, with two columns (male/female)

Then you pass an array of IndexedDataSets to SimilarityAnalysis.cooccurrencesIDSs(