Java - Initialize a HashMap of HashMaps

2020-02-10 14:25发布

I am new to java and practicing by creating a simplistic NaiveBayes classifier. I am still new to object instantiation, and wonder what to do to initialize a HashMap of HashMaps. When inserting new observations into the classifier, I can create a new HashMap for an unseen feature name in a given class, but do I need to initialize?

import java.util.HashMap;

public class NaiveBayes {

    private HashMap<String, Integer> class_counts;
    private HashMap<String, HashMap<String, Integer>> class_feature_counts;

    public NaiveBayes() {
        class_counts = new HashMap<String, Integer>();
        // do I need to initialize class_feature_counts?
    }

    public void insert() {
        // todo
        // I think I can create new hashmaps on the fly here for class_feature_counts
    }

    public String classify() {
        // stub 
        return "";
    }

    // Naive Scoring:
    // p( c | f_1, ... f_n) =~ p(c) * p(f_1|c) ... * p(f_n|c)
    private double get_score(String category, HashMap features) {
       // stub
       return 0.0;
    }

    public static void main(String[] args) {
        NaiveBayes bayes = new NaiveBayes();
       // todo
     }
}

Note this question is not specific to Naive Bayes classifiers, just thought I would provide some context.

4条回答
Evening l夕情丶
2楼-- · 2020-02-10 15:07

Recursive generic data structures, like maps of maps, while not an outright bad idea, are often indicative of something you could refactor - the inner map often could be a first order object (with named fields or an internal map), rather than simply a map. You'll still have to initialize these inner objects, but it often is a much cleaner, clearer way to develop.

For instance, if you have a Map<A,Map<B,C>> you're often really storing a map of A to Thing, but the way Thing is being stored is coincidentally a map. You'll often find it cleaner and easier to hide the fact that Thing is a map, and instead store a mapping of Map<A,Thing> where thing is defined as:

public class Thing {
    // Map is guaranteed to be initialized if a Thing exists
    private Map<B,C> data = new Map<B,C>();

    // operations on data, like get and put
    // now can have sanity checks you couldn't enforce when the map was public
}

Also, look into Guava's Mulitmap/Multiset utilities, they're very useful for cases like this, in particular they do the inner-object initializations automatically. Of note for your case, just about any time you implement Map<E, Integer> you really want a Guava Multiset. Cleaner and clearer.

查看更多
我想做一个坏孩纸
3楼-- · 2020-02-10 15:19

Yes, you need to initialize it.

class_feature_counts = new HashMap<String, HashMap<String, Integer>>();

When you want to add a value to class_feature_counts, you need to instantiate it too:

HashMap<String, Integer> val = new HashMap<String, Integer>();
// Do what you want to do with val
class_feature_counts.put("myKey", val);
查看更多
我只想做你的唯一
4楼-- · 2020-02-10 15:28
  1. Do not declare your variables with HashMap. It's too limiting.
  2. Yes, you need to initialize class_feature_counts. You'll be adding entries to it, so it has to be a valid map. In fact, initialize both at declaration and not in the constructor since there is only one way for each to start. I hope you're using Java 7 by now; it's simpler this way.

    private Map< String, Integer> classCounts = new HashMap<>();

    private Map< String, Map< String, Integer>> classFeatureCounts = new HashMap<>();

The compiler will deduce the types from the <>. Also, I changed the variable names to standard Java camel-case style. Are classCounts and classFeatureCounts connected?

查看更多
手持菜刀,她持情操
5楼-- · 2020-02-10 15:32

You must create an object before using it via a reference variable. It doesn't matter how complex that object is. You aren't required to initialize it in the constructor, although that is the most common case. Depending on your needs, you might want to use "lazy initialization" instead.

查看更多
登录 后发表回答