how does String.equals() work

2020-04-11 10:22发布

问题:

I have been trying to understand how some of API methods work

below is a snippet of equals method of java.lang.String class

Can someone out there tell me how actually the code is comparing two strings. I get the significance of count, but what does offset signify. how are these variables getting values ?

Like when i create a String. how are these initialized.

a detailed line by line description and also how and when the instance variables, value, count, offset etc are initialized ??

 public boolean equals(Object anObject) {
  1014           if (this == anObject) {
  1015               return true;
  1016           }
  1017           if (anObject instanceof String) {
  1018               String anotherString = (String)anObject;
  1019               int n = count;
  1020               if (n == anotherString.count) {
  1021                   char v1[] = value;
  1022                   char v2[] = anotherString.value;
  1023                   int i = offset;
  1024                   int j = anotherString.offset;
  1025                   while (n-- != 0) {
  1026                       if (v1[i++] != v2[j++])
  1027                           return false;
  1028                   }
  1029                   return true;
  1030               }
  1031           }
  1032           return false;
  1033       }

回答1:

Logically

while (n-- != 0) {
if (v1[i++] != v2[j++])
    return false;
}

is the same as

for (int i = 0; i < n; i++) {
    if (v1[i] != v2[j])
        return false;
    }
}

Why the JVM designers have done it this way I am not sure. Perhaps there is a performance improvement using a while loop than a for loop. It looks quite C like to me so maybe the person who wrote this has a background in c.

Offset is used to locate where the string starts within the char array. Internally Strings are stored as char arrays. This is value

if (v1[i++] != v2[j++])
    return false;

checks the characters in the string's underlying char array.

and line by line it is

if the refernce is pointing to the same object is must the equals

1014           if (this == anObject) {
1015               return true;
1016           }

if the object is a string then check they are equal

1017           if (anObject instanceof String) {

cast the parameter passed in as String.

1018               String anotherString = (String)anObject;

remember the length of this.string

1019               int n = count;

if the two string's lengths match

1020               if (n == anotherString.count) {

get an array of the characters (value is this array)

1021                   char v1[] = value;
1022                   char v2[] = anotherString.value;

find out where in this array the string starts

1023                   int i = offset;
1024                   int j = anotherString.offset;

loop through char array. if the values are different then return false

1025                   while (n-- != 0) {
1026                       if (v1[i++] != v2[j++])
1027                           return false;
1028                   }

everything else must be true

1029                   return true;
1030               }
1031           }

if not of type String then they cannot be equals

1032           return false;
1033       }

To understand offset and value look at the String class

/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

The constructors initialises these variables. The default constructor code is below. You should see something similar for the other constructors.

/**
  * Initializes a newly created {@code String} object so that it represents
  * an empty character sequence.  Note that use of this constructor is
  * unnecessary since Strings are immutable.
  */
 public String() {
    this.offset = 0;
    this.count = 0;
    this.value = new char[0];
 }

This is quite a good link to look at



回答2:

As you may be knowing that string handling in Java is a special case, most of the time the String is assigned from the String pools, so it might be the case that for a char array "I am Learning Java", one string reference points to "I am Learning Java", then offset would be 0, other string might point to "am" so offset would be 2. As some of the native code handles its initilization so i think offset is set during that process.(during sharing memory from String pool)

Also as you can see from the code

 public String(String original) {
         int size = original.count;
        char[] originalValue = original.value;
        char[] v;
        if (originalValue.length > size) {
            // The array representing the String is bigger than the new
          // String itself.  Perhaps this constructor is being called
            // in order to trim the baggage, so make a copy of the array.
           int off = original.offset;
          v = Arrays.copyOfRange(originalValue, off, off+size);
        } else {
           // The array representing the String is the same
          // size as the String, so no point in making a copy.
            v = originalValue;
       }
      this.offset = 0;
       this.count = size;
       this.value = v;
    }

When new String is created from the old one, it might be the case that old string(original in this case) might be from String pool, thats why first a offset is taken and then the whole array is copied to allocate new memory(new string doesn't share memory from String pool)

Also you should remember String is a derived type and the string is always stored in a character array, so we need an offset to determine from where the string starts in the character array.