Compound String key in HashMap

2019-05-20 13:01发布

问题:

We are storing a String key in a HashMap that is a concatenation of three String fields and a boolean field. Problem is duplicate keys can be created if the delimiter appears in the field value.

So to get around this, based on advice in another post, I'm planning on creating a key class which will be used as the HashMap key:

class TheKey {
  public final String k1;
  public final String k2;
  public final String k3;
  public final boolean k4;

  public TheKey(String k1, String k2, String k3, boolean k4) {
    this.k1 = k1; this.k2 = k2; this.k3 = k3; this.k4 = k4;
  }

  public boolean equals(Object o) {
      TheKey other = (TheKey) o;
      //return true if all four fields are equal
  }

  public int hashCode() {
    return ???;  
  }
}

My questions are:

  1. What value should be returned from hashCode(). The map will hold a total of about 30 values. Of those 30, there are about 10 distinct values of k1 (some entries share the same k1 value).
  2. To store this key class as the HashMap key, does one only need to override the equals() and hashCode() methods? Is anything else required?

回答1:

Just hashCode and equals should be fine. The hashCode could look something like this:

public int hashCode() {
  int hash = 17;
  hash = hash * 31 + k1.hashCode();
  hash = hash * 31 + k2.hashCode();
  hash = hash * 31 + k3.hashCode();
  hash = hash * 31 + k4 ? 0 : 1;
  return hash;
}

That's assuming none of the keys can be null, of course. Typically you could use 0 as the "logical" hash code for a null reference in the above equation. Two useful methods for compound equality/hash code which needs to deal with nulls:

public static boolean equals(Object o1, Object o2) {
  if (o1 == o2) {
    return true;
  }
  if (o1 == null || o2 == null) {
    return false;
  }
  return o1.equals(o2);
}

public static boolean hashCode(Object o) {
  return o == null ? 0 : o.hashCode();
}

Using the latter method in the hash algorithm at the start of this answer, you'd end up with something like:

public int hashCode() {
  int hash = 17;
  hash = hash * 31 + ObjectUtil.hashCode(k1);
  hash = hash * 31 + ObjectUtil.hashCode(k2);
  hash = hash * 31 + ObjectUtil.hashCode(k3);
  hash = hash * 31 + k4 ? 0 : 1;
  return hash;
}


回答2:

In Eclipse you can generate hashCode and equals by Alt-Shift-S h.



回答3:

Ask Eclipse 3.5 to create the hashcode and equals methods for you :)



回答4:

this is how a well-formed equals class with equals ans hashCode should look like: (generated with intellij idea, with null checks enabled)

class TheKey {
public final String k1;
public final String k2;
public final String k3;
public final boolean k4;

public TheKey(String k1, String k2, String k3, boolean k4) {
    this.k1 = k1;
    this.k2 = k2;
    this.k3 = k3;
    this.k4 = k4;
}

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    TheKey theKey = (TheKey) o;

    if (k4 != theKey.k4) return false;
    if (k1 != null ? !k1.equals(theKey.k1) : theKey.k1 != null) return false;
    if (k2 != null ? !k2.equals(theKey.k2) : theKey.k2 != null) return false;
    if (k3 != null ? !k3.equals(theKey.k3) : theKey.k3 != null) return false;

    return true;
}

@Override
public int hashCode() {
    int result = k1 != null ? k1.hashCode() : 0;
    result = 31 * result + (k2 != null ? k2.hashCode() : 0);
    result = 31 * result + (k3 != null ? k3.hashCode() : 0);
    result = 31 * result + (k4 ? 1 : 0);
    return result;
}
}


回答5:

The implementation of your hashCode() doesn't matter much unless you make it super stupid. You could very well just return the sum of all the strings hash codes (truncated to an int) but you should make sure you fix this:

  1. If your hash code implementation is slow, consider caching it in the instance. Depending on how long your key objects stick around and how they are used with the hash table when you get things out of it you may not want to spend longer than necessary calculating the same value over and over again. If you stick with Jon's implementation of hashCode() there is probably no need for it as String already cache its hashCode() for you.
    This is however more of a general advice, since the mid 90's I've seen quite a few developers get stung on slow (and even worse, changing) hashCode() implementations.

  2. Don't be sloppy when you create the equals() implementation. Your equals() above will be both ineffective and flawed. First of all you don't need to compare the values if the objects have different hash codes. You should also return false (and not a null pointer exception) if you get a null as the argument.

The rules are simple, this page will walk you through them.

Edit: I have to ask one more thing... You say "Problem is duplicate keys can be created if the delimiter appears in the field value". Why is that? If the format is key+delimiter+key+delimiter+key it really doesn't matter if there are one or more delimiters in the keys unless you get really unlucky with a combination of two keys and in that case you probably should have selected another delimiter (there are quite a few to choose from in unicode).

Anyway, Jon is right in his comment below... Don't do caching "until you've proven it's a good thing". It is a good practice always.



回答6:

Have you taken a look at the specifications of hashCode()? Perhaps this will give you a better idea of what the function should return.



回答7:

I do not know if this is an option for you but apache commons library provides an implementation for MultiKeyMap



回答8:

For the hashCode, you could instead use something like

k1.hashCode() ^ k2.hashCode() ^ k3.hashCode() ^ k4.hashCode()

XOR is entropy-preserving, and this incorporates k4's hashCode in a much better way than the previous suggestions. Just having one bit of information from k4 means that if all your composite keys have identical k1, k2, k3 and only differing k4s, your hash codes will all be identical and you'll get a degenerate HashMap.



回答9:

I thought your main concern was speed (based on your original post)? Why don't you just make sure you use a separator which does not occur in your (handfull of) field values? Then you can just create String key using concatenation and do away with all this 'key-class' hocus pocus. Smells like serious over-engineering to me.



标签: java map key