stod does not work correctly with boost::locale

2019-02-18 10:55发布

问题:

I am trying to use boost::locale and std::stod together in a german locale where a comma is the decimal separator. Consider this code:

boost::locale::generator gen;

std::locale loc("");  // (1)
//std::locale  loc = gen("");  // (2)

std::locale::global(loc);
std::cout.imbue(loc);

std::string s = "1,1";  //float string in german locale!
double d1 = std::stod(s);
std::cout << "d1: " << d1 << std::endl;

double d2 = 2.2;
std::cout << "d2: " << d2 << std::endl;

std::locale loc("") creates the correct locale and the output is

d1: 1,1
d2: 2,2

as I expect. When I comment out line (1) and uncomment line (2), the output is

d1: 1
d2: 2.2

The result for d2 is to be expected. As far as I understand boost::locale wants me to explicitly specify that d2 should be formated as a number and doing

std::cout << "d2: " << boost::locale::as::number << d2 << std::endl;

fixes the output to 2,2 again. The problem is that std::stod does not consider 1,1 as a valid floating point number anymore and truncates it to 1.

My question is: why does std::stod stops working when I generate my locale with boost::locale ?

Additional information: I am using VC++2015, Boost 1.60, no ICU, Windows 10

Update:

I noticed that the problem is fixed when I set the global locale twice, first with std::locale("") and then with boost:

std::locale::global(std::locale(""));
bl::generator gen;
std::locale::global(gen(""));

I have no idea why it behaves this way, though!

回答1:

Long story short: boost::locale changes only the global c++-locale object, but not the C-locale. stod uses the C-locale and not the global c++-locale object. std::localechanges both: the global c++-locale object and the C locale.


The whole story: std::locale is a subtle thing and responsible for a lot of debugging!

Let's start with the c++ class std::locale:

  std::locale loc("de_DE.utf8");  
  std::cout<<loc.name()<<"\n\n\n";

creates the German locale (if it is available on your machine, otherwise it throws), which results in de_DE.utf8 on the console.

However it does not change the global c++ locale object, which is created at the start-up of your program and is classical ("C"). The constructor of std::locale without arguments returns a copy of the global state:

...
  std::locale loc2;
  std::cout<<loc2.name()<<"\n\n\n";

Now you should see C if nothing messed up your locale before. std::locale("") would do some magic and find out the preferences of the user and return it as object, without changing the global state.

You can change the local state with std::local::global:

  std::locale::global(loc);
  std::locale loc3;
  std::cout<<loc3.name()<<"\n\n\n";

The default constructor results this time in de_DE.utf8 on the console. We can restore the global state to the classical by calling:

  std::locale::global(std::locale::classic());
  std::locale loc4;
  std::cout<<loc4.name()<<"\n\n\n";

which should give you C again.

Now, when the std::cout is created it clones its locale from the global c++ state (here we do it with the stringstreams, but it the same). Later changes of the global state does not affect the stream:

 //classical formating
  std::stringstream c_stream;

 //german formating:
  std::locale::global(std::locale("de_DE.utf8"));
  std::stringstream de_stream;

  //same global locale, different results:
  c_stream<<1.1;
  de_stream<<1.1;

  std::cout<<c_stream.str()<<" vs. "<<de_stream.str()<<"\n";

Gives you 1.1 vs. 1,1 - the first is the classical the second german

You can change the local locale-object of a stream with imbue(std::locale::classic()) it goes without saying, that this doesn't change the global state:

  de_stream.imbue(std::locale::classic());
  de_stream<<" vs. "<<1.1;
  std::cout<<de_stream.str()<<"\n";
  std::cout<<"global c++ state: "<<std::locale().name()<<"\n";

and you see:

1,1 vs. 1.1
global c++ state: de_DE.utf8

Now we are coming to std::stod. As you can imagine it uses the global c++ locale (not entirely true, bear with me) state and not the (private) state of the cout-stream:

std::cout<<std::stod("1.1")<<" vs. "<<std::stod("1,1")<<"\n";

gives you 1 vs. 1.1 because the global state is still "de_DE.utf8", so the first parsing stops at '.' but the local state of std::cout is still "C". After restoring the global state we get the classical behaviour:

  std::locale::global(std::locale::classic());
  std::cout<<std::stod("1.1")<<" vs. "<<std::stod("1,1")<<"\n";

Now the German "1,1" is not parsed properly: 1.1 vs. 1

Now you might think we are done, but there is more - I promised to tell you about std::stod.

Next to the global c++ locale there is so called (global) C locale (comes from the C language and not to be confused with the classical "C" locale). Each time we changed the global c++ locale the C locale has been changed too.

Getting/setting of the C locale can be done with std::setlocale(...). To query the current value run:

std::cout<<"(global) C locale is "<<std::setlocale(LC_ALL,NULL)<<"\n";

to see (global) C locale is C.To set the C locale run:

  assert(std::setlocale(LC_ALL,"de_DE.utf8")!=NULL);
  std::cout<<"(global) C locale is "<<std::setlocale(LC_ALL,NULL)<<"\n";

which yields (global) C locale is de_DE.utf8. But what is now the global c++ locale?

std::cout<<"global c++ state: "<<std::locale().name()<<"\n";

As you may expect, C knows nothing about c++ global locale and leaves it unchanged: global c++ state: C.

Now we are not in Kansas any more! The old c-functions would use the C-locale and new c++ function the global c++. Brace yourself for funny debugging!

What would you expect

std::cout<<"C: "<<std::stod("1.1")<<" vs. DE :"<<std::stod("1,1")<<"\n";

to do? std::stod is a brand-new c++11 function after all and it should use global c++ locale! Think again...:

1 vs. 1.1

It gets the German format right, because the C-locale is set to 'de_DE.utf8' and it uses old C-style functions under the hood.

Just for the sake of completeness, the std::streams use the global c++ locale:

  std::stringstream stream;//creating with global c++ locale
  stream<<1.1;
  std::cout<<"I'm still in 'C' format: "<<stream.str()<<"\n";

gives you: I'm still in 'C' format: 1.1.

Edit: An alternative method to parse string without messing with global locale or be disturbed by it:

bool s2d(const std::string &str, double  &val, const std::locale &loc=std::locale::classic()){

  std::stringstream ss(str);
  ss.imbue(loc);
  ss>>val;
  return ss.eof() && //all characters interpreted
         !ss.fail(); //nothing went wrong
}

The following tests shows:

  double d=0;
  std::cout<<"1,1 parsed with German locale successfully :"<<s2d("1,1", d, std::locale("de_DE.utf8"))<<"\n";
  std::cout<<"value retrieved: "<<d<<"\n\n";

  d=0;
  std::cout<<"1,1 parsed with Classical locale successfully :"<<s2d("1,1", d, std::locale::classic())<<"\n";
  std::cout<<"value retrieved: "<<d<<"\n\n";

  d=0;
  std::cout<<"1.1 parsed with German locale successfully :"<<s2d("1.1", d, std::locale("de_DE.utf8"))<<"\n";
  std::cout<<"value retrieved: "<<d<<"\n\n";

  d=0;
  std::cout<<"1.1 parsed with Classical locale successfully :"<<s2d("1.1", d, std::locale::classic())<<"\n";
  std::cout<<"value retrieved: "<<d<<"\n\n";

That only the first and the last conversions are successful:

1,1 parsed with German locale successfully :1
value retrieved: 1.1

1,1 parsed with Classical locale successfully :0
value retrieved: 1

1.1 parsed with German locale successfully :0
value retrieved: 11

1.1 parsed with Classical locale successfully :1
value retrieved: 1.1

std::stringstream may be not the fastest but has its merits...