atoi() with other languages

I am working on a internationalization project. Do other languages, such as Arabic or Chinese, use different representations for digits besides 0-9? If so, are there versions of atoi() that will account for these other representations?

I should add that I am mainly concerned with parsing input from the user. If the users types in some other representation I want to be sure that I recognize it as a number and treat it accordingly.

标签： c++ visual-c++ mfc internationalization

2条回答

傲

2楼-- · 2019-06-19 16:16

I may use std::wistringstream and locale to generate this integer.

#include <sstream>
#include <locale>
using namespace std;

int main()
{
  locale mylocale("en-EN"); // Construct locale object with the user's default preferences
  wistringstream wss(L"1");  // your number string
  wss.imbue( mylocale );    // Imbue that locale
  int target_int = 0;
  wss >> target_int;
  return 0;
}

More info on stream class and on locale class.

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2019-06-19 16:20

If you are concerned about international characters, then you need to ensure you use an "Unicode-aware" function such as _wtoi(..).

You can also check if UNICODE is supported to make it type independent (from MSDN):

TCHAR tstr[4] = TEXT("137");

#ifdef UNICODE
size_t cCharsConverted;
CHAR strTmp[SIZE]; // SIZE equals (2*(sizeof(tstr)+1)). This ensures enough
                   // room for the multibyte characters if they are two 
                   // bytes long and a terminating null character. See Security 
                   // Alert below. 

wcstombs_s(&cCharsConverted, strTmp, sizeof(strTmp), (const wchar_t *)tstr, sizeof(strTmp));
num = atoi(strTmp);

#else

int num = atoi(tstr);

#endif

In this example, the standard C library function wcstombs translates Unicode to ASCII. The example relies on the fact that the digits 0 through 9 can always be translated from Unicode to ASCII, even if some of the surrounding text cannot. The atoi function stops at any character that is not a digit.

Your application can use the National Language Support (NLS) LCMapString function to process text that includes the native digits provided for some of the scripts in Unicode.

Caution Using the wcstombs function incorrectly can compromise the security of your application. Make sure that the application buffer for the string of 8-bit characters is at least of size 2*(char_length +1), where char_length represents the length of the Unicode string. This restriction is made because, with double-byte character sets (DBCSs), each Unicode character can be mapped to two consecutive 8-bit characters. If the buffer does not hold the entire string, the result string is not null-terminated, posing a security risk. For more information about application security, see Security Considerations: International Features.

0人赞添加讨论(0) 举报

atoi() with other languages

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间