GetHashCode Extension Method

2019-01-31 02:55发布

问题:

After reading all the questions and answers on StackOverflow concerning overriding GetHashCode() I wrote the following extension method for easy and convenient overriding of GetHashCode():

public static class ObjectExtensions
{
    private const int _seedPrimeNumber = 691;
    private const int _fieldPrimeNumber = 397;
    public static int GetHashCodeFromFields(this object obj, params object[] fields) {
        unchecked { //unchecked to prevent throwing overflow exception
            int hashCode = _seedPrimeNumber;
            for (int i = 0; i < fields.Length; i++)
                if (fields[i] != null)
                    hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();
            return hashCode;
        }
    }
}

(I basically only refactored the code that someone posted there, because I really like that it can be used generally)

which I use like this:

    public override int GetHashCode() {
        return this.GetHashCodeFromFields(field1, field2, field3);
    }

Do you see any problems with this code?

回答1:

That looks like a solid way to do it.

My only suggestion is that if you're really concerned about performance with it, you may want to add generic versions for several common cases (ie. probably 1-4 args). That way, for those objects (which are most likely to be small, key-style composite objects), you won't have the overhead of building the array to pass to the method, the loop, any boxing of generic values, etc. The call syntax will be exactly the same, but you'll run slightly more optimized code for that case. Of course, I'd run some perf tests over this before you decide whether it's worth the maintenance trade-off.

Something like this:

public static int GetHashCodeFromFields<T1,T2,T3,T4>(this object obj, T1 obj1, T2 obj2, T3 obj3, T4 obj4) {
    int hashCode = _seedPrimeNumber;
    if(obj1 != null)
        hashCode *= _fieldPrimeNumber + obj1.GetHashCode();
    if(obj2 != null)
        hashCode *= _fieldPrimeNumber + obj2.GetHashCode();
    if(obj3 != null)
        hashCode *= _fieldPrimeNumber + obj3.GetHashCode();
    if(obj4 != null)
        hashCode *= _fieldPrimeNumber + obj4.GetHashCode();
    return hashCode;
}


回答2:

I wrote some stuff a little while back that you might solve your problem... (And actually, it could probably be improved to include the seed that you have...)

Anyway, the project is called Essence ( http://essence.codeplex.com/ ), and it uses the System.Linq.Expression libraries to generate (based on attributes) standard representations of Equals/GetHashCode/CompareTo/ToString, as well as being able to create IEqualityComparer and IComparer classes based on an argument list. (I also have some further ideas, but would like to get some community feedback before continuing too much further.)

(What this means is that it's almost as fast as being handwritten - the main one where it isn't is the CompareTo(); cause the Linq.Expressions doesn't have the concept of a variable in the 3.5 release - so you have to call CompareTo() on the underlying object twice when you don't get a match. Using the DLR extensions to Linq.Expressions solves this. I suppose I could have used the emit il, but I wasn't that inspired at the time.)

It's quite a simple idea, but I haven't seen it done before.

Now the thing is, I kind of lost interest in polishing it (which would have included writing an article for codeproject, documenting some of the code, or the like), but I might be persuaded to do so if you feel it would be something of interest.

(The codeplex site doesn't have a downloadable package; just go to the source and grab that - oh, it's written in f# (although all the test code is in c#) as that was the thing I was interested in learning.)

Anyway, here is are c# example from the test in the project:

    // --------------------------------------------------------------------
    // USING THE ESSENCE LIBRARY:
    // --------------------------------------------------------------------
    [EssenceClass(UseIn = EssenceFunctions.All)]
    public class TestEssence : IEquatable<TestEssence>, IComparable<TestEssence>
    {
        [Essence(Order=0] public int MyInt           { get; set; }
        [Essence(Order=1] public string MyString     { get; set; }
        [Essence(Order=2] public DateTime MyDateTime { get; set; }

        public override int GetHashCode()                                { return Essence<TestEssence>.GetHashCodeStatic(this); }
    ...
    }

    // --------------------------------------------------------------------
    // EQUIVALENT HAND WRITTEN CODE:
    // --------------------------------------------------------------------
    public class TestManual
    {
        public int MyInt;
        public string MyString;
        public DateTime MyDateTime;

        public override int GetHashCode()
        {
            var x = MyInt.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= (MyString == null) ? 0 : MyString.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= MyDateTime.GetHashCode();
            return x;
        }
    ...
    }

Anyway, the project, if anyone thinks is worthwhile, needs polishing, but the ideas are there...



回答3:

I looks pretty good to me, I only have one issue: It is a shame that you have to use an object[] to pass in the values as this will box any value types you send to the function. I don't think you have much of a choice though, unless you go the route of creating some generic overloads like others have suggested.



回答4:

On general principle you should scope your unchecked as narrowly as you reasonably can, though it doesn't matter much here. Other than that, looks fine.



回答5:

public override int GetHashCode() {
    return this.GetHashCodeFromFields(field1, field2, field3, this);
}

(yes, I'm very pedantic but this is the only problem that I see)



回答6:

More optimal:

  1. Create a code generator that uses reflection to look through your business object fields and creates a new partial class which overrides GetHashCode() (and Equals()).
  2. Run the code generator when your program starts up in debug mode, and if the code has changed, exit with a message to the developer to recompile.

The advantages of this are:

  • Using reflection you know which fields are value types or not, and hence whether they need null checks.
  • There are no overheads - no extra function calls, no list construction, etc. This is important if you are doing lots of dictionary lookups.
  • Long implementations (in classes with lots of fields) are hidden in partial classes, away from your important business code.

Disadvantages:

  • Overkill if you don't do lots of dictionary lookups/calls to GetHashCode().


回答7:

I should point out that you should almost never do allocation while implementing GetHashCode (here's some useful blog posts about it).

The way that params works (generating a new array on the fly) means this is really not a good general solution. You would be better using a method call per field and maintaiing the hash state as a variable passed to them (this makes it easy to use better hashing functions and avalanching too).



回答8:

Apart from the problems arising from using params object[] fields, I think not using the type information may be a performance issue in some situations too. Suppose two classes A, B have the same type and number of fields and implement the same interface I. Now if you put A and B objects to a Dictionary<I, anything> objects with equal fields and different types will end up in the same bucket. I'd probably insert some statement like hashCode ^= GetType().GetHashCode();

Jonathan Rupp's accepted answer deals with params array but do not deal with boxing of value types. So, if performance is very important I'd probably declare GetHashCodeFromFields having not object but int parameters, and send not the fields themselves but the hash codes of the fields. i.e.

public override int GetHashCode() 
{
    return this.GetHashCodeFromFields(field1.GetHashCode(), field2.GetHashCode());
}


回答9:

One problem that could arise is when multiplication hits 0, final hashCode is always 0, as I just experienced with an object with a lot of properties, in the following code :

hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();

I'd suggest :

hashCode = hashCode * _fieldPrimeNumber + fields[i].GetHashCode();

Or something similar with xor like this :

hashCode = hashCode * _fieldPrimeNumber ^ fields[i].GetHashCode();