Signed vs. unsigned integers for lengths/counts

2019-01-14 02:38发布

问题:

For representing a length or count variable, is it better to use signed or unsigned integers?

It seems to me that C++ STL tends to prefer unsigned (std::size_t, like in std::vector::size(), instead C# BCL tends to prefer signed integers (like in ICollection.Count.

Considering that a length or a count are non-negative integers, my intuition would choose unsigned; but I fail to understand why the .NET designers chose signed integers.

What is the best approach? What are the pros and cons of each one?

回答1:

C++ uses unsigned values because they need the full range. On a 32-bit system, the language should make it possible to have a 4 GB vector, not just a 2 GB one. (the OS might not allow you to use all 4 GB, but the language itself doesn't want to get in your way)

In .NET, unsigned integers aren't CLS-compliant. You can use them (in some .NET languages), but it limits portability and compatibility. So for the base class library, they only use signed integers.

However, these are both edge cases. For most purposes, a signed int is big enough. So as long as both offer the range you need, you can use both.

One advantage that signed integers sometimes have is that they make it easier to detect underflow. Suppose you're computing an array index, and because of some bad input, or perhaps a logic error in your program, you end up trying to access index -1.

With a signed integer, that is easy to detect. With unsigned, it would wrap around and become UINT_MAX. That makes it much harder to detect the error, because you expected a positive number, and you got a positive number.

So really, it depends. C++ uses unsigned because it needs the range. .NET uses signed because it needs to work with languages which don't have unsigned.

In most cases, both will work, and sometimes, signed may enable your code to detect errors more robustly.



回答2:

It's natural to use unsigned types for counts and sizes unless we're in some context where they can be negative and yet be meaningful. My guess is that C++ follows this same logic of its elder brother C, in which strlen() returns size_t and malloc() takes size_t.

The problem in C++ (and C) with signed and unsigned integers is that you must know how they are converted to one another when you're using a mixture of the two kinds. Some advocate using signed ints for everything integer to avoid this issue of programmers' ignorance and inattention. But I think programmers must know how to use their tools of trade (programming languages, compilers, etc). Sooner or later they'll be bit by the conversion, if not in what they have written, then in what someone else has. It's unavoidable.

So, know your tools, choose what makes sense in your situation.



回答3:

There's a few aspects here:

1) Max Values: typically the maximum value of an signed number is 1/2 that of the corresponding unsigned max value. For example in C, the max signed short value is 32767 whereas the max unsigned short value is 65535 (because 1/2 of the range isn't needed for the -ve numbers). So if your expecting lengths or counts that are going to be large an unsigned representation makes more sense.

2) Security: You can browse the net for integer overflow errors, but imagine code such as:

if (length <= 100)
{
  // do something with file
}

... then if 'length' is an signed value, you run the risk of 'length' being a -ve number (though malicious intent, some cast, etc) and the code not performing a you expected. I've seen this on a previous project where a sequence was incremented for each transaction, but when the signed integer we used got to max signed int value (2147483647) it suddenly became -ve after the next increment and our code couldn't handle it.

Just some things to think about, regardless of the underlying language/API considerations.



回答4:

If you aren't designing a reusable library (in .NET terms, e.g. a VB.NET project consumes your C# class library) then pick what works for you. Of course if you are creating any kind of DLL, and it's feasible your library could be used in a project with a different language (again, VB.NET comes to mind) then you need to be mindful of the non-compliant types (unsigned).