How use of == operator brings in performance impro

2020-07-13 07:17发布

问题:

In Effective JAVA by Joshua Bloch, when I was reading about static factory methods , there was a statement as follows

The ability of static factory methods to return the same object from repeated invocations allows classes to maintain strict control over what instances exist at any time. Classes that do this are said to be instance-controlled. There are several reasons to write instance-controlled classes. Instance control allows a class to guarantee that it is a singleton (Item 3) or noninstantiable (Item 4). Also, it allows an immutable class (Item 15) to make the guarantee that no two equal instances exist: a.equals(b) if and only if a==b. If a class makes this guarantee, then its cli- ents can use the == operator instead of the equals(Object) method, which may result in improved performance. Enum types (Item 30) provide this guarantee.

To investigate how == operator brings in performance improvements , I got to look at String.java

I saw this snippet

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String) anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                            return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

By performance improvement what does he mean here ? how it brings performance improvement .

Does he mean to say the following

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . If they hold same data they are one and the same object .That is they point to same memory location.

Am I right in this inference ?

If I am wrong can you guide me in understanding this ?

回答1:

If every class can assure that a.equals(b) if and only if a==b , it means it brings in an indirect requirement that there cannot be objects referring to 2 different memory spaces and still hold the same data , which is memory wastage . If they hold same data they are one and the same object .That is they point to same memory location.

Yes, that is what the author is driving at.

If you can (for a given class, this won't be possible for all, in particular it cannot work for mutable classes) call == (which is single JVM opcode) instead of equals (which is a dynamically dispatched method call), it saves (some) overhead.

It works this way for enums for example.

And even if someone called the equals method (which would be good defensive programming practice, you don't want to get into the habit of using == for objects IMHO), that method could be implemented as a simple == (instead of having to look at potentially complex object state).

Incidentally, even for "normal" equals methods (such as String's), it is probably a good idea in their implementation to first check for object identity and then short-cut looking at object state (which is what String#equals does, as you have found out).



回答2:

What the quoted portion means is that an immutable class can choose to intern its instances. This is easy to implement via Guava's Interner, for example:

public class MyImmutableClass {
    private static final Interner<MyImmutableClass> INTERN_POOL = Interners.newWeakInterner();
    private final String foo;
    private final int bar;

    private MyImmutableClass(String foo, int bar) {
        this.foo = foo;
        this.bar = bar;
    }

    public static MyImmutableClass of(String foo, int bar) {
        return INTERN_POOL.intern(new MyImmutableClass(foo, bar));
    }

    @Override
    public int hashCode() {
        return Objects.hashCode(foo, bar);
    }

    @Override
    public boolean equals(Object o) {
        if (o == this)
            return true;        // fast path for interned instances
        if (o instanceof MyImmutableClass) {
            MyImmutableClass rhs = (MyImmutableClass) o;
            return Objects.equal(foo, rhs.foo)
                    && bar == rhs.bar;
        }
        return false;
    }
}

Here, the constructor is made private: all instances have to be through the MyImmutableClass.of() factory method, which uses the Interner to ensure that if the new instance is equals() to an existing instance, the existing instance is returned instead.

Interning can only be used for immutable objects, by which I mean objects whose observable state (i.e., the behaviour of all its externally-accessible methods, in particular equals() and hashCode()) does not change for the objects' lifetimes. If you intern mutable objects, the behaviour will be wrong when an instance is modified.

As many other people have already stated, you should carefully choose which objects to intern, even if they're immutable. Only do it if the set of interned values is small relative to the number of duplicates you are likely to have. For example, it's not worth interning Integer generally, because there are over 4 billion possible values. But it is worth interning the most commonly-used Integer values, and in fact, Integer.valueOf() interns values between -128 and 127. On the other hand, enums are great to intern (and they are interned, by definition) because the set of possible values is small.

For most classes in general, you'd have to do heap analysis, such as by using jhat (or, to plug my own project, fasthat), to decide if there are enough duplicates to warrant interning. In other cases, just keep it simple and don't intern.



回答3:

If you can guarantee that no two instances of an object exist such that their semantic values are equivalent (i.e. if x and y refer to different instances [x != y] then x.equals(y) == false for all x and y), then this implies that you can compare two references' objects for equality simply by checking to see if they refer to the same instance, which is what == does.

The implementation of == essentially just compares two integers (memory addresses) and generally would be faster than virtually all nontrivial implementations of .equals().

It is worth noting that this is not a jump that can be made for Strings, as you cannot guarantee that any two instances of a String are not equivalent, e.g.:

String x = new String("hello");
String y = new String("hello");

Since x != y && x.equals(y), it is not sufficient to just do x == y to check for equality.



回答4:

To answer your questions ...

By performance improvement what does he mean here [String]? How it brings performance improvement.

This is NOT an example of what Bloch is talking about. Bloch is talking about instance-controlled classes, and String is not such a class!

Am I right in this inference?

Yes that is correct. An instance-controlled class for which the instances are immutable can ensure that objects that are "the same" will always be equal according to the == operator.

Some observations though:

  • This only applies to immutable objects. Or more precisely to objects where mutation does not affect the semantics of equality.

  • This only applies to fully instance-controlled classes.

  • Instance control can be expensive. Consider the form of (partial) instance control provided by the String class's intern method and the string pool.

    • The string pool is effectively a hash table of weak references to String objects. This occupies extra memory.

    • Each time you intern a String, it will calculate the string's hash code and probe the hash table to see if a similar string has already been intern'd

    • Each time a full GC is performed, the weak references in the string pool result in extra "tracing" work for the GC, and then potentially more work if the GC decides to break references.

    You typically get similar overheads when you implement your own instance-controlled classes. When you do cost-benefit analysis, these overheads count against the benefits of faster instance comparison.



回答5:

I think it means this:

If you need to test two complex structures for equality you generally need to do a lot of tests to make sure they are the same.

But if because of some trick of the language you knew that two complex but equal structures can't exist simultaneously then instead of verifying equality by comparing them bit by bit you can just verify that they are in the same location in memory and return false if they are not.

If anyone can create objects then you can't guarantee that two objects can't be created that are the same but are distinct instances.. but if you control the creation of objects and only create distinct objects then you don't need complex equality tests.



回答6:

In cases where complicated values are encapsulated using references to immutable objects, there are generally three scenarios that can arise when comparing two references:

  • They are references to the same object (very fast)

  • They are references to different objects which encapsulate different values (often fast, but sometimes slow)

  • They are references to different objects which encapsulate the same value (generally always slow)

If objects will be found to be equal more often than not, there can be substantial value to minimizing the frequency of case 3. If objects will often be very nearly equal, there can also be substantial value to ensuring that the slow subcases of case 2 don't happen very often.

If one makes certain that for any given value there will never be more than one object which holds that value, code which observes that two references identify different objects may infer that they encapsulate different values, without having to actually examine the values in question. The value of doing this is often somewhat limited, however. If the objects in question are large, complicated, nested collections which will sometimes be very similar, one may have each collection compute and cache a 128-bit hash of its contents; two collections with different content are unlikely to have matching hash values, and collections with different hash values may quickly recognized as unequal. On the other hand, having references that encapsulate the same content generally identify to the same object, even if a few references to identical collections exist, can improve the performance of the otherwise-always-bad "equals" case.

An approach that one could use if one didn't want to use a separate interning collection would be to have each object keep a long sequence number such that one can always determine which of two otherwise-identical objects was created first, along with a reference to the oldest object which is known to hold the same content. To compare two references, start by identifying the oldest object known to be equivalent to each. If oldest object known to match the first isn't the same as the oldest object known to match the second, compare the objects' contents. If they match, one will be newer than the other, and that object can regard the other as the "oldest object known to match".