I am new to java and practicing by creating a simplistic NaiveBayes classifier. I am still new to object instantiation, and wonder what to do to initialize a HashMap of HashMaps. When inserting new observations into the classifier, I can create a new HashMap for an unseen feature name in a given class, but do I need to initialize?
import java.util.HashMap;
public class NaiveBayes {
private HashMap<String, Integer> class_counts;
private HashMap<String, HashMap<String, Integer>> class_feature_counts;
public NaiveBayes() {
class_counts = new HashMap<String, Integer>();
// do I need to initialize class_feature_counts?
}
public void insert() {
// todo
// I think I can create new hashmaps on the fly here for class_feature_counts
}
public String classify() {
// stub
return "";
}
// Naive Scoring:
// p( c | f_1, ... f_n) =~ p(c) * p(f_1|c) ... * p(f_n|c)
private double get_score(String category, HashMap features) {
// stub
return 0.0;
}
public static void main(String[] args) {
NaiveBayes bayes = new NaiveBayes();
// todo
}
}
Note this question is not specific to Naive Bayes classifiers, just thought I would provide some context.
Recursive generic data structures, like maps of maps, while not an outright bad idea, are often indicative of something you could refactor - the inner map often could be a first order object (with named fields or an internal map), rather than simply a map. You'll still have to initialize these inner objects, but it often is a much cleaner, clearer way to develop.
For instance, if you have a
Map<A,Map<B,C>>
you're often really storing a map of A to Thing, but the way Thing is being stored is coincidentally a map. You'll often find it cleaner and easier to hide the fact that Thing is a map, and instead store a mapping ofMap<A,Thing>
where thing is defined as:Also, look into Guava's Mulitmap/Multiset utilities, they're very useful for cases like this, in particular they do the inner-object initializations automatically. Of note for your case, just about any time you implement
Map<E, Integer>
you really want a Guava Multiset. Cleaner and clearer.Yes, you need to initialize it.
When you want to add a value to class_feature_counts, you need to instantiate it too:
HashMap
. It's too limiting.Yes, you need to initialize
class_feature_counts
. You'll be adding entries to it, so it has to be a valid map. In fact, initialize both at declaration and not in the constructor since there is only one way for each to start. I hope you're using Java 7 by now; it's simpler this way.private Map< String, Integer> classCounts = new HashMap<>();
private Map< String, Map< String, Integer>> classFeatureCounts = new HashMap<>();
The compiler will deduce the types from the <>. Also, I changed the variable names to standard Java camel-case style. Are
classCounts
andclassFeatureCounts
connected?You must create an object before using it via a reference variable. It doesn't matter how complex that object is. You aren't required to initialize it in the constructor, although that is the most common case. Depending on your needs, you might want to use "lazy initialization" instead.