Are C# Strings (and other .NET API's) limited

2019-01-26 10:03发布

Today I noticed that C#'s String class returns the length of a string as an Int. Since an Int is always 32-bits, no matter what the architecture, does this mean that a string can only be 2GB or less in length?

A 2GB string would be very unusual, and present many problems along with it. However, most .NET api's seem to use 'int' to convey values such as length and count. Does this mean we are forever limited to collection sizes which fit in 32-bits?

Seems like a fundamental problem with the .NET API's. I would have expected things like count and length to be returned via the equivalent of 'size_t'.

8条回答
三岁会撩人
2楼-- · 2019-01-26 10:07

In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for string is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.

查看更多
一纸荒年 Trace。
3楼-- · 2019-01-26 10:08

The fact that the framework uses Int32 for Count/Length properties, indexers etc is a bit of a red herring. The real problem is that the CLR currently has a max object size restriction of 2GB.

So a string -- or any other single object -- can never be larger than 2GB.

Changing the Length property of the string type to return long, ulong or even BigInteger would be pointless since you could never have more than approx 2^30 characters anyway (2GB max size and 2 bytes per character.)

Similarly, because of the 2GB limit, the only arrays that could even approach having 2^31 elements would be bool[] or byte[] arrays that only use 1 byte per element.

Of course, there's nothing to stop you creating your own composite types to workaround the 2GB restriction.

(Note that the above observations apply to Microsoft's current implementation, and could very well change in future releases. I'm not sure whether Mono has similar limits.)

查看更多
老娘就宠你
4楼-- · 2019-01-26 10:10

Even in x64 versions of Windows I got hit by .Net limiting each object to 2GB.

2GB is pretty small for a medical image. 2GB is even small for a Visual Studio download image.

查看更多
时光不老,我们不散
5楼-- · 2019-01-26 10:12

At some value of String.length() probably about 5MB its not really practical to use String anymore. String is optimised for short bits of text.

Think about what happens when you do

msString += " more chars"

Something like:

System calculates length of myString plus length of " more chars"

System allocates that amount of memory

System copies myString to new memory location

System copies " more chars" to new memory location after last copied myString char

The original myString is left to the mercy of the garbage collector.

While this is nice and neat for small bits of text its a nightmare for large strings, just finding 2GB of contiguous memory is probably a showstopper.

So if you know you are handling more than a very few MB of characters use one of the *Buffer classes.

查看更多
我命由我不由天
6楼-- · 2019-01-26 10:13

It's pretty unlikely that you'll need to store more than two billion objects in a single collection. You're going to incur some pretty serious performance penalties when doing enumerations and lookups, which are the two primary purposes of collections. If you're dealing with a data set that large, There is almost assuredly some other route you can take, such as splitting up your single collection into many smaller collections that contain portions of the entire set of data you're working with.

Heeeey, wait a sec.... we already have this concept -- it's called a dictionary!

If you need to store, say, 5 billion English strings, use this type:

Dictionary<string, List<string>> bigStringContainer;

Let's make the key string represent, say, the first two characters of the string. Then write an extension method like this:

public static string BigStringIndex(this string s)
{
    return String.Concat(s[0], s[1]);
}

and then add items to bigStringContainer like this:

bigStringContainer[item.BigStringIndex()].Add(item);

and call it a day. (There are obviously more efficient ways you could do that, but this is just an example)

Oh, and if you really really really do need to be able to look up any arbitrary object by absolute index, use an Array instead of a collection. Okay yeah, you use some type safety, but you can index array elements with a long.

查看更多
再贱就再见
7楼-- · 2019-01-26 10:17

Correct, the maximum length would be the size of Int32, however you'll likely run into other memory issues if you're dealing with strings larger than that anyway.

查看更多
登录 后发表回答