Today I noticed that C#'s String class returns the length of a string as an Int. Since an Int is always 32-bits, no matter what the architecture, does this mean that a string can only be 2GB or less in length?
A 2GB string would be very unusual, and present many problems along with it. However, most .NET api's seem to use 'int' to convey values such as length and count. Does this mean we are forever limited to collection sizes which fit in 32-bits?
Seems like a fundamental problem with the .NET API's. I would have expected things like count and length to be returned via the equivalent of 'size_t'.
In versions of .NET prior to 4.5, the maximum object size is 2GB. From 4.5 onwards you can allocate larger objects if gcAllowVeryLargeObjects is enabled. Note that the limit for
string
is not affected, but "arrays" should cover "lists" too, since lists are backed by arrays.The fact that the framework uses
Int32
forCount
/Length
properties, indexers etc is a bit of a red herring. The real problem is that the CLR currently has a max object size restriction of 2GB.So a
string
-- or any other single object -- can never be larger than 2GB.Changing the
Length
property of thestring
type to returnlong
,ulong
or evenBigInteger
would be pointless since you could never have more than approx 2^30 characters anyway (2GB max size and 2 bytes per character.)Similarly, because of the 2GB limit, the only arrays that could even approach having 2^31 elements would be
bool[]
orbyte[]
arrays that only use 1 byte per element.Of course, there's nothing to stop you creating your own composite types to workaround the 2GB restriction.
(Note that the above observations apply to Microsoft's current implementation, and could very well change in future releases. I'm not sure whether Mono has similar limits.)
Even in x64 versions of Windows I got hit by .Net limiting each object to 2GB.
2GB is pretty small for a medical image. 2GB is even small for a Visual Studio download image.
At some value of String.length() probably about 5MB its not really practical to use String anymore. String is optimised for short bits of text.
Think about what happens when you do
Something like:
System calculates length of myString plus length of " more chars"
System allocates that amount of memory
System copies myString to new memory location
System copies " more chars" to new memory location after last copied myString char
The original myString is left to the mercy of the garbage collector.
While this is nice and neat for small bits of text its a nightmare for large strings, just finding 2GB of contiguous memory is probably a showstopper.
So if you know you are handling more than a very few MB of characters use one of the *Buffer classes.
It's pretty unlikely that you'll need to store more than two billion objects in a single collection. You're going to incur some pretty serious performance penalties when doing enumerations and lookups, which are the two primary purposes of collections. If you're dealing with a data set that large, There is almost assuredly some other route you can take, such as splitting up your single collection into many smaller collections that contain portions of the entire set of data you're working with.
Heeeey, wait a sec.... we already have this concept -- it's called a dictionary!
If you need to store, say, 5 billion English strings, use this type:
Let's make the key string represent, say, the first two characters of the string. Then write an extension method like this:
and then add items to bigStringContainer like this:
and call it a day. (There are obviously more efficient ways you could do that, but this is just an example)
Oh, and if you really really really do need to be able to look up any arbitrary object by absolute index, use an
Array
instead of a collection. Okay yeah, you use some type safety, but you can index array elements with along
.Correct, the maximum length would be the size of Int32, however you'll likely run into other memory issues if you're dealing with strings larger than that anyway.