可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm currently experiencing random floating point errors when compiling for x86 targets using VC++ 11 (CTP Update 1). See the short example "test.cpp" below, and compile using:

cl /GL /O2 /EHsc test.cpp /link /MACHINE:X86

The output should be 10 == 10, but it produces 10 == 0 when /GL (whole program optimization) is enabled. The problem seems to be that get_scaling_factor() pushes the result on the floating point stack, but the calling function is expecting it in the SSE register XMM0.

Question: am I missing something obvious, or is this really a bug? The test program, of course, doesn't make sense, as it is a stripped down test case.

test.cpp:

#include <iostream>

template <typename T>
inline T get_scaling_factor(int units)
{
    switch (units)
    {
    case 0: return 1;  
    case 1: return 10;  
    case 2: return 100;  
    case 3: return 1000;  
    case 4: return 10000;  
    case 5: return 100000;  
    case 6: return 1000000;  
    case 7: return 10000000;  
    case 8: return 100000000;  
    case 9: return 1000000000; 
    default: return 1;
    }
}

template <int targetUnits, typename T>
inline T scale(T value, int sourceUnits)
{
    return value   * get_scaling_factor<T>(sourceUnits) 
                   / get_scaling_factor<T>(targetUnits);
}

__declspec(noinline)
double scale(double value, int units) 
{
    return scale<9>(value, units);
}

int main()
{
    std::cout << "10 = " << scale(1e9, 1) << std::endl;
}

Update

Issue confirmed by Microsoft. It even affects straight forward code like this:

#include <stdio.h>
double test(int a)
{
    switch (a)
    {
    case 0: return 1.0;
    case 1: return 10.0;
    case 2: return 100.0;
    case 3: return 1000.0;
    case 4: return 10000.0;
    case 5: return 100000.0;
    case 6: return 1000000.0;
    case 7: return 10000000.0;
    case 8: return 100000000.0;
    case 9: return 1000000000.0;
    default: return 1.0;
    }
}

void main()
{
    int nine = 9;
    double x = test(nine);
    x /= test(7);
    int val = (int)x;
    if (val == 100)
        printf("pass");
    else 
        printf("fail, val is %d", val);
}

回答1:

Yes, this is definitely a code optimizer bug and I had no trouble reproducing it. Optimizer bugs are usually associated with inlining but that's not the case here. This bug got introduced by the heavy code-gen changes in VS2012 that support the new auto-vectorizing feature.

In a nutshell, the get_scaling_factor() function returns the result on the FPU stack. The code generator properly emits the instruction to retrieve it from the stack and store it in an XMM register. But the optimizer inappropriate removes that code entirely, as though it assumes that the function result was already stored in XMM0.

A workaround is hard to come by, specializing the template function for double has no effect. Disabling optimization with #pragma optimize works:

#pragma optimize("", off)
__declspec(noinline)
double scale(double value, int units) 
{
    return scale<9>(value, units);
}
#pragma optimize("", on)

Your repro code is very good and Microsoft will have no trouble fixing this bug from this. You can file a feedback report at connect.microsoft.com, just link to this question. Or if you are in a hurry then you can contact Microsoft Support although I'd imagine they'll give you the same workaround to last you to the service pack.

UPDATE: fixed in VS2013.

回答2:

/GL ignores default calling conventions, by design. With LTCG, the compiler/linker knows about the entire call graph so it can match caller and callee. Using an SSE register isn't weird in that context.

I'm not entirely sure what you mean by "get_scaling_factor() pushes the result on the floating point stack", though. Do you mean that the compiler fails to inline it? I expect the compiler to do so, since the call graph has only one caller. (We know that `get_scaling_factor(targetUnits) was inlined, since that would have caused a division by zero otherwise)

If the compiler indeed fails to inline get_scaling_factor(), then you've in fact found two bugs: One inlining failure, and one custom calling convention failure.