Unexpected behavior of Substring in C# [duplicate]

2020-08-09 08:28发布

问题:

The definition of Substring() method in .net System.String class is like this

public string Substring(int startIndex)

Where startIndex is "The zero-based starting character position of a substring in this instance" as per the method definition. If i understand it correctly, it means it will give me a part of the string, starting at the given zero-based index.

Now, if I have a string "ABC" and take substring with different indexes, I get following results.

var str = "ABC";
var chars = str.ToArray(); //returns 3 char 'A', 'B', 'C' as expected

var sub2 = str.Substring(2); //[1] returns "C" as expected
var sub3 = str.Substring(3); //[2] returns "" ...!!! Why no exception??
var sub4 = str.Substring(4); //[3] throws ArgumentOutOfRangeException as expected

Why it doesn't throw exception for case [2] ??

The string has 3 characters, so indexes are [0, 1, 2], and even ToArray(), ToCharArray() method returns 3 characters as expected! Shouldn't it throw exception if I try to Substring() with starting index 3?

回答1:

The documentation is quite explicit about this being correct behaviour:

Return value: a string that is equivalent to the substring that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance.

Throws ArgumentOutOfRangeException if startIndex is less than zero or *greater than the length of this instance. *

In other words, taking a substring starting just beyond the final character will give you an empty string.

Your comment that you expected it to give you a part of the string is not incompatible with this. A "part of the string" includes the set of all substrings of zero length as well, as evidenced by the fact that s.substring(n, 0) will also give an empty string.



回答2:

There are lots of technical answers here saying how the framework handles the method call, but I'd like to give a reasoning by analogy for why it is like it is.

Consider the string as a fence where the fence panels themselves are the characters, held up with fence posts numbered as shown below:

0   1   2   3
| A | B | C |   "ABC"

0   1   2   3   4   5   6   7   8   9
| M | y |   | S | t | r | i | n | g |   "My String"

In this analogy, string.Substring(n) returns a string of panels starting with fencepost n. Notice that the last character of the string has a fence post after it. Calling the function with this fence post returns a value stating there are no fence panels after this point (ie. it returns the empty string).

Similarly, string.Substring(n, l) returns a string of l panels starting with fencepost n. This is why something like "ABC".Substring(2, 0) returns "", too.



回答3:

Sometimes looking at the code can be handy :

First this is called :

public string Substring(int startIndex)
{
    return this.Substring(startIndex, this.Length - startIndex);
}

The length is 0 due to substraction of value :

public string Substring(int startIndex, int length)
{
    if (startIndex < 0)
    {
        throw new ...
    }
    if (startIndex > this.Length)
    {
        throw new ...
    }
    if (length < 0)
    {
        throw new ...
    }
    if (startIndex > (this.Length - length))
    {
         throw new ...
    }
    if (length == 0) // <-- NOTICE HERE
    {
        return Empty;
    }
    if ((startIndex == 0) && (length == this.Length))
    {
        return this;
    }
    return this.InternalSubString(startIndex, length);
}


回答4:

Based on what is written on MSDN:

*

Return Value - A string that is equivalent to the substring that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance.

Exceptions ArgumentOutOfRangeException - startIndex is less than zero or greater than the length of this instance

*



回答5:

Looking at the String.Substring Method documentation, an empty string will be returned if the start index is equal to the length.

A string that is equivalent to the substring of length length that begins at startIndex in this instance, or Empty if startIndex is equal to the length of this instance and length is zero.



回答6:

What Substring does is that it checks if startIndex is greater than the length of the string and only then it throws the exception. In your case it is equal (the string length is 3). After that it checks if the length of the substring is zero and if it is returns String.Empty. In your case the length of the substring is the length of the string (3) minus the startIndex (3). This is why the length of the substring is 0 and an empty string is returned.



回答7:

All strings in C# in the end have String.Empty.

Here is good answer on this question.

From MSDN - String Class (System):

In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length. However, in some languages such as C and C++, a null character indicates the end of a string; it is not considered a part of the string and is not counted as part of the string's length.



回答8:

To supplement other answers, Mono also correctly implements this behavior.

public String Substring (int startIndex)
{
    if (startIndex == 0)
        return this;
    if (startIndex < 0 || startIndex > this.length)
        throw new ArgumentOutOfRangeException ("startIndex");

    return SubstringUnchecked (startIndex, this.length - startIndex);
}

// This method is used by StringBuilder.ToString() and is expected to
// always create a new string object (or return String.Empty). 
internal unsafe String SubstringUnchecked (int startIndex, int length)
{
    if (length == 0)
        return String.Empty;

    string tmp = InternalAllocateStr (length);
    fixed (char* dest = tmp, src = this) {
        CharCopy (dest, src + startIndex, length);
    }
    return tmp;
}

As you can see, it returns String.Empty if the length is equal to zero.