When to use different integer types?

2019-03-25 08:18发布

站内文章 / 移动开发

52 0

别忘想泡老子

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Programming languages (e.g. c, c++, and java) usually have several types for integer arithmetic:

signed and unsigned types
types of different size: short, int, long, long long
types of guaranteed and non guaranteed (i.e.implementation dependent) size:
e.g. int32_t vs int (and I know that int32_t is not part of the language)

How would you summarize when one should use each of them?

回答1:

The default integral type (int) gets a "first among equals" preferential treatment in pretty much all languages. So we can use that as a default, if no reasons to prefer another type exist.

Such reasons might be:

Using a bigger type if you know you need the additional range, or a smaller type if you want to conserve memory and don't mind the smaller range.
Using an unsigned type to make sure that you don't get any "extra" 1s in your integer representation if you intend to use bit shifting operators (<< and >>).
If the language does not guarantee a minimum (or even fixed) size for a type (e.g. C/C++ vs C#/Java), and you care about its properties, you should prefer some mechanism of generating a type with guaranteed size (e.g. int32_t) -- if your program is meant to be portable and expected to be compiled with different compilers, this becomes more important.

Update (expanding on guaranteed size types)

My personal opinion is that types with no guaranteed fixed size are more trouble than worth today. I won't go into the historical reasons that gave birth to them (briefly: source code portability), but the reality is that in 2011 very few people, if any, stand to benefit from them.

On the other hand, there are lots of things that can go wrong when using such types:

The type turns out to not have the necessary range
You access the underlying memory for a variable (maybe to serialize it) but due to the processor's endianness and the non-fixed size of the type you end up introducing a bug

For these reasons (and there are probably others too), using such types is in theory a major pain. Additionally, unless extreme portability is a requirement, you don't stand to benefit at all to compensate. And indeed, the whole purpose of typedefs like int32_t is to eliminate usage of loosely sized types entirely.

As a practical matter, if you know that your program is not going to be ported to another compiler or architecture, you can ignore the fact that the types have no fixed size and treat them as if they are the known size your compiler uses for them.

回答2:

One by one to your questions:

signed and unsigned : depends on what you need. If you're sure, that the number will be unsigned - use unsigned. This will give you the opportunity to use bigger numbers. For example, a signed char (1B) has range [-128:127], but if it's unsigned - the max value is doubled (you have one more bit to use - sign bit, so unsigned char could be 255 (all bits are 1)
short, int, long, long long - these are pretty clear, aren't it? The smallest integer (except char) is short, next one is int, etc. But these ones are platform dependent - int could be 2B (long long ago :D ), 4B (usually). long could be 4B (in 32bit platform), or 8B (on 64bit platform), etc. long long is not standard type in C++ (it will be in C++0x), but usually it's a typedef for int64_t.
int32_t vs int - int32_t and other types like this guarantee their size. For example, int32_t is guaranteed to be 32bit, while, as I already said, the size of int is platform dependent.

回答3:

use shorter to save memory, longer to be able to represent larger numbers. If you don't have such requirements, consider what APIs you'll be sharing data with and set yourself up so you don't have to cast or convert too much.

回答4:

In general you should use the type that suits the requirements of your program and promotes readability and future maintainability as much as possible.

Having said that, as Chris points out people do use shorts vs ints to save memory. Think about the following scenario, you have 1,000,000 (a fairly small number) ints (typically 32 bytes) vs shorts (typically 16 bytes). If you know you'll never need to represent a number larger than 32,767, you could use a short. Or you could use an unsigned short if you know you'll never need to represent a number larger than 65,535. This would save: ((32 - 16) x 1,000,000) = 16k of memory.

回答5:

You should also consider using unbounded integers and naturals, for when you want to model numbers properly without worrying about overflow.

Libraries such as GMP and OpenSSL provide "big nums" that support arbitrarily large results -- which is usually the safest thing, unless you can prove the results of your computations are within bounds.

Additionally, many languages default to unbounded integer types because they are safer.

回答6:

Typically, you use int, unless you need to expand it because you need a larger range or you want to shrink it because you know the value only makes sense in a smaller range. It's incredibly rare that you would need to change due to memory considerations- the difference between them is miniscule.

回答7:

The need for different size integer types arises from two basic problems: type size pretty much needs to be known on a low level, and in the old days of memory constraints, it was important in every day usage as you had little to play with. If you're doing basic arithmetic, then it is almost unimportant as to which type you use. For example, when looping from 1 to 100, it isn't really important whether you use uint16_t or uint64_t, provided your types fit and you're using the correct type if you need/don't need signing.

However, it becomes important when you want to scale / need to optimise. If you want to allocate 1,000,000,000 such integers, then saying "each one will be 64-bit just in case" or "I'll use the built in type" isn't going to cut it - that's 8,000,000,000 bytes which is about 7.5GB of data. Now 1 billion 64-bit integers doesn't sound feasible, but these could be points in a 3D array (1000^3), in which case you've also got pointer sizes to content with too. You could go a lot further with 8-bit integers for your values, for example. How much you can allocate then informs your algorithms - if you can't allocate that much data at once you might consider mmap or handling parts of the data at once (a.k.a. handling swapping yourself). If you don't need values of a certain size, that's when you start using more constrained types. Similarly, you get an extra power of 2 by using unsigned types. Very useful.

If you're ever writing assembly to go along with your C/C++, then knowing the size of the data type is pretty much crucial, especially when it comes to allocating local stack space or reading variables from memory addresses (as opposed to already in a register). Think of it as programming entirely using void*. As such, I tend to use stdint.h defined types by habit, so that I'm always aware of my sizes and formats, when mixing the two.

回答8:

Maybe just for fun, here you have a simple example showing how according to what type you choose you have one result or an other.

Naturally the actual reason why you would choose one type or an other, in my opinion, is related to other factors. For instance the great shift operator.

#include <iostream>
#include <cmath>
using namespace std;

int main()
{
    int i;

    //unsigned long long x;
    //int x;
    short x;

    x = 2;
    for (i=2; i<15; ++i)
    {
        x=pow(x,2);
        cout << x << endl;
    }
    return 0;
}