-->

Is atoi multithread safe? [closed]

2019-09-22 11:48发布

问题:

I am experiancing some error while creating a multithreaded program. While using gdb to debug, the atoi function is throwing error. Please help, is atoi multithread unsafe and if so, what are the alternatives?

回答1:

Its quite easy to implement a replacement for atoi():

int strToInt(const char *text)
{
  int n = 0, sign = 1;
  switch (*text) {
    case '-': sign = -1;
    case '+': ++text;
  }
  for (; isdigit(*text); ++text) n *= 10, n += *text - '0';
  return n * sign;
}

(Demonstration on ideone)

It doesn't seem to make much sense to replace something which is already available. Thus, I want to mention some thouhgts about this.

The implementation can be adjusted to the precise personal requirements:

  • a check for integer overflow may be added
  • the final value of text may be returned (as in strtol()) to check how many characters have been processed or to do further parsing of other contents
  • a variant might be used for unsigned (which does not accept a sign).
  • preceding spaces may or may not be accepted
  • special syntax may be considered
  • and anything else beyound my imagination.

Extending this idea to other numeric types like e.g. float or double, it becomes even more interesting.

As floating point numbers are definitely subject of localization this has to be considered. (Concerning decimal integer numbers I'm not sure what could be localized but even this might be the case.) If a text file reader with floating point number syntax (like in C) is implemented you may not forget to adjust the locale to C before using strtod() (using setlocale()). (Being a German I'm sensitive to this topic, as in the German locale, the meaning of '.' and ',' are just vice versa like in English.)

{ const char *localeOld = setlocale(LC_ALL, "C");
  value = strtod(text);
  setlocale(LC_ALL, localeOld);
}

Another fact is, that consideration of locale (even if adjusted to C) seems to be somehow expensive. Some years ago, we implemented an own floating point reader as replacement of strtod() which provided a speed-up of 60 ... 100 in a COLLADA reader (an XML file format where files often provide lots of floating point numbers).

Update:

Encouraged by the feedback of Paul Floyd, I got curious how faster strToInt() might be. Thus, I built a simple test suite and made some measurements:

#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int strToInt(const char *text)
{
  int n = 0, sign = 1;
  switch (*text) {
    case '-': sign = -1;
    case '+': ++text;
  }
  for (; isdigit(*text); ++text) n *= 10, n += *text - '0';
  return n * sign;
}

int main(int argc, char **argv)
{
  int n = 10000000; /* default number of measurements */
  /* read command line options */
  if (argc > 1) n = atoi(argv[1]);
  if (n <= 0) return 1; /* ERROR */
  /* build samples */
  assert(sizeof(int) <= 8); /* May be, I want to do it again 20 years ago. */
  /* 24 characters should be capable to hold any decimal for int
   * (upto 64 bit)
   */
  char (*samples)[24] = malloc(n * 24 * sizeof(char));
  if (!samples) {
    printf("ERROR: Cannot allocate samples!\n"
      "(Out of memory.)\n");
    return 1;
  }
  for (int i = 0; i < n; ++i) sprintf(samples[i], "%d", i - (i & 1) * n);
  /* assert correct results, ensure fair caching, pre-heat CPU */
  int *retAToI = malloc(n * sizeof(int));
  if (!retAToI) {
    printf("ERROR: Cannot allocate result array for atoi()!\n"
      "(Out of memory.)\n");
    return 1;
  }
  int *retStrToInt = malloc(n * sizeof(int));
  if (!retStrToInt) {
    printf("ERROR: Cannot allocate result array for strToInt()!\n"
      "(Out of memory.)\n");
    return 1;
  }
  int nErrors = 0;
  for (int i = 0; i < n; ++i) {
    retAToI[i] = atoi(samples[i]); retStrToInt[i] = strToInt(samples[i]);
    if (retAToI[i] != retStrToInt[i]) {
      printf("ERROR: atoi(\"%s\"): %d, strToInt(\"%s\"): %d!\n",
        samples[i], retAToI[i], samples[i], retStrToInt[i]);
      ++nErrors;
    }
  }
  if (nErrors) {
    printf("%d ERRORs found!", nErrors);
    return 2;
  }
  /* do measurements */
  enum { nTries = 10 };
  time_t tTbl[nTries][2];
  for (int i = 0; i < nTries; ++i) {
    printf("Measurement %d:\n", i + 1);
    { time_t t0 = clock();
      for (int i = 0; i < n; ++i) retAToI[i] = atoi(samples[i]);
      tTbl[i][0] = clock() - t0;
    }
    { time_t t0 = clock();
      for (int i = 0; i < n; ++i) retStrToInt[i] = strToInt(samples[i]);
      tTbl[i][1] = clock() - t0;
    }
    /* assert correct results (and prevent that measurement is optimized away) */
    for (int i = 0; i < n; ++i) if (retAToI[i] != retStrToInt[i]) return 3;
  }
  /* report */
  printf("Report:\n");
  printf("%20s|%20s\n", "atoi() ", "strToInt() ");
  printf("--------------------+--------------------\n");
  double tAvg[2] = { 0.0, 0.0 }; const char *sep = "|\n";
  for (int i = 0; i < nTries; ++i) {
    for (int j = 0; j < 2; ++j) {
      double t = (double)tTbl[i][j] / CLOCKS_PER_SEC;
      printf("%19.3f %c", t, sep[j]);
      tAvg[j] += t;
    }
  }
  printf("--------------------+--------------------\n");
  for (int j = 0; j < 2; ++j) printf("%19.3f %c", tAvg[j] / nTries, sep[j]);
  /* done */
  return 0;
}

I tried this on some platforms.

VS2013 on Windows 10 (64 bit), Release mode:

Report:
             atoi() |         strToInt()
--------------------+--------------------
              0.232 |              0.200
              0.310 |              0.240
              0.253 |              0.199
              0.231 |              0.201
              0.232 |              0.253
              0.247 |              0.201
              0.238 |              0.201
              0.247 |              0.223
              0.248 |              0.200
              0.249 |              0.200
--------------------+--------------------
              0.249 |              0.212

gcc 5.4.0 on cygwin, Windows 10 (64 bit), gcc -std=c11 -O2:

Report:
             atoi() |         strToInt() 
--------------------+--------------------
              0.360 |              0.312 
              0.391 |              0.250 
              0.360 |              0.328 
              0.391 |              0.312 
              0.375 |              0.281 
              0.359 |              0.282 
              0.375 |              0.297 
              0.391 |              0.250 
              0.359 |              0.297 
              0.406 |              0.281 
--------------------+--------------------
              0.377 |              0.289

Sample uploaded and executed on codingground
gcc 4.8.5 on Linux 3.10.0-327.36.3.el7.x86_64, gcc -std=c11 -O2:

Report:
             atoi() |         strToInt() 
--------------------+--------------------
              1.080 |              0.750 
              1.000 |              0.780 
              0.980 |              0.770 
              1.010 |              0.770 
              1.000 |              0.770 
              1.010 |              0.780 
              1.010 |              0.780 
              1.010 |              0.770 
              1.020 |              0.780 
              1.020 |              0.780 
--------------------+--------------------
              1.014 |              0.773 

Well, strToInt() is a little bit faster. (Without -O2, it was even slower than atoi() but the standard library was probably optimized too.)

Note:

As the time measurement involves assignment and loop operations, this provides a qualitative statement about which one is faster. It doesn't provide a quantitative factor. (To get one, the measurement would become much more complicated.)

Due to the simplicity of atoi(), the application had to use it very often until it becomes even worth to consider the development effort...



回答2:

Is atoi multithread safe?

Yes, in the linux man page of atoi() it is written:

┌────────────────────────┬───────────────┬────────────────┐
│Interface               │ Attribute     │ Value          │
├────────────────────────┼───────────────┼────────────────┤
│atoi(), atol(), atoll() │ Thread safety │ MT-Safe locale │
└────────────────────────┴───────────────┴────────────────┘

So it's just using the variables you pass from your thread (locale) and is completely thread-safe (MT-Safe), as long as you don't pass the same memory location e.g. a pointer to a char array from two threads to that function.

If you would do that, both funcions calls (thread one and two) would use the same memory location, in the case of atoi() it is not that bad, because that function only reads from memory, see the argument const char* nptr. It is a pointer to a constant char array.


Here is also an explanation of the terms/attributes.

MT-Safe:

MT-Safe or Thread-Safe functions are safe to call in the presence of other threads. MT, in MT-Safe, stands for Multi Thread.

locale:

locale Functions annotated with locale as an MT-Safety issue read from the locale object without any form of synchronization. Functions annotated with locale called concurrently with locale changes may behave in ways that do not correspond to any of the locales active during their execution, but an unpredictable mix thereof.


While using gdb to debug, the atoi function is throwing error.

The atoi() function doesn't provide any error information at all, if the conversion is not successful it returns 0 and you don't know if that may be the actual number to convert. Further the atoi() function does not throw at all! The following output I produced with a little portion of C code, see online at ideone:

atoi with "3"        to integer: +3
atoi with "    3   " to integer: +3
atoi with "   -3   " to integer: -3
atoi with "str 3   " to integer: +0
atoi with "str-3   " to integer: +0
atoi with "    3str" to integer: +3
atoi with "   -3str" to integer: -3
atoi with "str-3str" to integer: +0

You can see that atoi() converts successfully if the first part is a number ignoring whitespace and characters after the first number part. If there are non numerical characters first it fails and return 0 and does not throw.


You should consider using strtol() instead as it can detect range overflows in which case is sets the errno.
Further you get an end pointer which show you how much characters were consumed. If that value is 0 there must be something wrong with the conversion. It is threadsafe like atoi().

I did the same to output it for strtol(), you can see it also in the ideone online example from above:

0: strtol with "3"         to integer: +3 | errno =  0, StartPtr = 0x7ffc47e9a140, EndPtr = 0x7ffc47e9a141, PtrDiff = 1
1: strtol with "    3   "  to integer: +3 | errno =  0, StartPtr = 0x7ffc47e9a130, EndPtr = 0x7ffc47e9a135, PtrDiff = 5
2: strtol with "   -3   "  to integer: -3 | errno =  0, StartPtr = 0x7ffc47e9a120, EndPtr = 0x7ffc47e9a125, PtrDiff = 5
3: strtol with "str 3   "  to integer: +0 | errno =  0, StartPtr = 0x7ffc47e9a110, EndPtr = 0x7ffc47e9a110, PtrDiff = 0 --> Error!
4: strtol with "str-3   "  to integer: +0 | errno =  0, StartPtr = 0x7ffc47e9a100, EndPtr = 0x7ffc47e9a100, PtrDiff = 0 --> Error!
5: strtol with "    3str"  to integer: +3 | errno =  0, StartPtr = 0x7ffc47e9a0f0, EndPtr = 0x7ffc47e9a0f5, PtrDiff = 5
6: strtol with "   -3str"  to integer: -3 | errno =  0, StartPtr = 0x7ffc47e9a0e0, EndPtr = 0x7ffc47e9a0e5, PtrDiff = 5
7: strtol with "str-3str"  to integer: +0 | errno =  0, StartPtr = 0x7ffc47e9a0d0, EndPtr = 0x7ffc47e9a0d0, PtrDiff = 0 --> Error!
8: strtol with "s-r-3str"  to integer: +0 | errno =  0, StartPtr = 0x7ffc47e9a0c0, EndPtr = 0x7ffc47e9a0c0, PtrDiff = 0 --> Error!

On this thread: Detecting strtol failure the right usage of strtol() is discussed concerning error detection.