int vs size_t on 64bit

2019-03-24 05:15发布

问题:

Porting code from 32bit to 64bit. Lots of places with

int len = strlen(pstr);

These all generate warnings now because strlen() returns size_t which is 64bit and int is still 32bit. So I've been replacing them with

size_t len = strlen(pstr);

But I just realized that this is not safe, as size_t is unsigned and it can be treated as signed by the code (I actually ran into one case where it caused a problem, thank you, unit tests!).

Blindly casting strlen return to (int) feels dirty. Or maybe it shouldn't?
So the question is: is there an elegant solution for this? I probably have a thousand lines of code like that in the codebase; I can't manually check each one of them and the test coverage is currently somewhere between 0.01 and 0.001%.

回答1:

As a compromise, you could use ssize_t (if available). Fake it up if not, using long long, int_fast64_t, intmax_t, or have a platform porting header which lets a suitable type be specified for a platform. ssize_t is in POSIX not standard C or C++, but if you ever hit a platform which doesn't have a signed type of the same size as size_t then I sympathise.

A cast to int is nearly safe (assuming 32 bit int on your 64 bit platform, which seems reasonable), because a string is unlikely to be more than 2^31 bytes long. A cast to a larger signed type is even safer. Customers who can afford 2^63 bytes of memory are what's known in the trade as "a good problem to have" ;-)

Of course, you could check it:

size_t ulen = strlen(pstr);
if (ulen > SSIZE_MAX) abort(); // preferably trace, log, return error, etc.
ssize_t len = (ssize_t) ulen;

Sure there's an overhead, but if you have 1000 instances then they can't all be performance-critical. For the ones which are (if any), you can do the work to investigate whether len being signed actually matters. If it doesn't, switch to size_t. If it does, rewrite or just take a risk on never meeting an object that absurdly huge. The original code would almost certainly have done the wrong thing anyway on the 32bit platform, if len had been negative as a result of strlen returning a value bigger than INT_MAX.



回答2:

Some time ago I posted a short note about this kind of issues on my blog and the short answer is:

Always use proper C++ integer types

Long answer: When programming in C++, it’s a good idea to use proper integer types relevant to particular context. A little bit of strictness always pays back. It’s not uncommon to see a tendency to ignore the integral types defined as specific to standard containers, namely size_type. It’s available for number of standard container like std::string or std::vector. Such ignorance may get its revenge easily.

Below is a simple example of incorrectly used type to catch result of std::string::find function. I’m quite sure that many would expect there is nothing wrong with the unsigned int here. But, actually this is just a bug. I run Linux on 64-bit architecture and when I compile this program as is, it works as expected. However, when I replace the string in line 1 with abc, it still works but not as expected :-)

#include <iostream>
#include <string>
using namespace std;
int main()
{
  string s = "a:b:c"; // "abc" [1]
  char delim = ':';
  unsigned int pos = s.find(delim);
  if(string::npos != pos)
  {
    cout << delim << " found in " << s << endl;
  }
}

Fix is very simply. Just replace unsigned int with std::string::size_type. The problem could be avoided if somebody who wrote this program took care of use of correct type. Not to mention that the program would be portable straight away.

I’ve seen this kind of issues quite many times, especially in code written by former C programmers who do not like to wear the muzzle of strictness the C++ types system enforces and requires. The example above is a trivial one, but I believe it presents the root of the problem well.

I recommend brilliant article 64-bit development written by Andrey Karpov where you can find a lot more on the subject.



回答3:

Setting the compiler warnings to the maximum level should get you a nice report of every incorrect sign conversion. In gcc, '-Wall -Wextra' should do.

You can also use a static code analyzer like cppcheck to see if everything is right.



回答4:

You could use ssize_t (the signed variant of size_t).



回答5:

You can treat site_t signed safely in most cases. The unsigned size_t will be treated as negative only when it (or the intermediate results in expressions) is bigger then 2^31 (for 32-bit) or 2^63 for 64 bit.

UPDATE: Sorry, size_t will be unsafe in constructions like while ( (size_t)t >=0 ). So right answer is to use ssize_t.



回答6:

If your compiler supports c++0x:

auto len = strlen(pstr);