可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Solution: Apparently the culprit was the use of floor(), the performance of which turns out to be OS-dependent in glibc.

This is a followup question to an earlier one: Same program faster on Linux than Windows -- why?

I have a small C++ program, that, when compiled with nuwen gcc 4.6.1, runs much faster on Wine than Windows XP (on the same computer). The question: why does this happen?

The timings are ~15.8 and 25.9 seconds, for Wine and Windows respectively. Note that I'm talking about the same executable, not only the same C++ program.

The source code is at the end of the post. The compiled executable is here (if you trust me enough).

This particular program does nothing useful, it is just a minimal example boiled down from a larger program I have. Please see this other question for some more precise benchmarking of the original program (important!!) and the most common possibilities ruled out (such as other programs hogging the CPU on Windows, process startup penalty, difference in system calls such as memory allocation). Also note that while here I used rand() for simplicity, in the original I used my own RNG which I know does no heap-allocation.

The reason I opened a new question on the topic is that now I can post an actual simplified code example for reproducing the phenomenon.

The code:

#include <cstdlib>
#include <cmath>


int irand(int top) {
    return int(std::floor((std::rand() / (RAND_MAX + 1.0)) * top));
}

template<typename T>
class Vector {
    T *vec;
    const int sz;

public:
    Vector(int n) : sz(n) {
        vec = new T[sz];
    }

    ~Vector() {
        delete [] vec;
    }

    int size() const { return sz; }

    const T & operator [] (int i) const { return vec[i]; }
    T & operator [] (int i) { return vec[i]; }
};


int main() {
    const int tmax = 20000; // increase this to make it run longer
    const int m = 10000;
    Vector<int> vec(150);

    for (int i=0; i < vec.size(); ++i)
        vec[i] = 0;

    // main loop
    for (int t=0; t < tmax; ++t)
        for (int j=0; j < m; ++j) {
            int s = irand(100) + 1;
            vec[s] += 1;
        }

    return 0;
}

UPDATE

It seems that if I replace irand() above with something deterministic such as

int irand(int top) {
    static int c = 0;
    return (c++) % top;
}

then the timing difference disappears. I'd like to note though that in my original program I used a different RNG, not the system rand(). I'm digging into the source of that now.

UPDATE 2

Now I replaced the irand() function with an equivalent of what I had in the original program. It is a bit lengthy (the algorithm is from Numerical Recipes), but the point was to show that no system libraries are being called explictly (except possibly through floor()). Yet the timing difference is still there!

Perhaps floor() could be to blame? Or the compiler generates calls to something else?

class ran1 {
    static const int table_len = 32;
    static const int int_max = (1u << 31) - 1;

    int idum;
    int next;
    int *shuffle_table;

    void propagate() {
        const int int_quo = 1277731;

        int k = idum/int_quo;
        idum = 16807*(idum - k*int_quo) - 2836*k;
        if (idum < 0)
            idum += int_max;
    }

public:
    ran1() {
        shuffle_table = new int[table_len];
        seedrand(54321);
    }
    ~ran1() {
        delete [] shuffle_table;
    }

    void seedrand(int seed) {
        idum = seed;
        for (int i = table_len-1; i >= 0; i--) {
            propagate();
            shuffle_table[i] = idum;
        }
        next = idum;
    }

    double frand() {
        int i = next/(1 + (int_max-1)/table_len);
        next = shuffle_table[i];
        propagate();
        shuffle_table[i] = idum;
        return next/(int_max + 1.0);
    }
} rng;


int irand(int top) {
    return int(std::floor(rng.frand() * top));
}

回答1:

edit: It turned out that the culprit was floor() and not rand() as I suspected - see the update at the top of the OP's question.

The run time of your program is dominated by the calls to rand().

I therefore think that rand() is the culprit. I suspect that the underlying function is provided by the WINE/Windows runtime, and the two implementations have different performance characteristics.

The easiest way to test this hypothesis would be to simply call rand() in a loop, and time the same executable in both environments.

edit I've had a look at the WINE source code, and here is its implementation of rand():

/*********************************************************************
 *              rand (MSVCRT.@)
 */
int CDECL MSVCRT_rand(void)
{
    thread_data_t *data = msvcrt_get_thread_data();

    /* this is the algorithm used by MSVC, according to
     * http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators */
    data->random_seed = data->random_seed * 214013 + 2531011;
    return (data->random_seed >> 16) & MSVCRT_RAND_MAX;
}

I don't have access to Microsoft's source code to compare, but it wouldn't surprise me if the difference in performance was in the getting of thread-local data rather than in the RNG itself.

回答2:

Wikipedia says:

Wine is a compatibility layer not an emulator. It duplicates functions of a Windows computer by providing alternative implementations of the DLLs that Windows programs call,[citation needed] and a process to substitute for the Windows NT kernel. This method of duplication differs from other methods that might also be considered emulation, where Windows programs run in a virtual machine.[2] Wine is predominantly written using black-box testing reverse-engineering, to avoid copyright issues.

This implies that the developers of wine could replace an api call with anything at all to as long as the end result was the same as you would get with a native windows call. And I suppose they weren't constrained by needing to make it compatible with the rest of Windows.

回答3:

From what I can tell, the C standard libraries used WILL be different in the two different scenarios. This affects the rand() call as well as floor().

From the mingw site... MinGW compilers provide access to the functionality of the Microsoft C runtime and some language-specific runtimes. Running under XP, this will use the Microsoft libraries. Seems straightforward.

However, the model under wine is much more complex. According to this diagram, the operating system's libc comes into play. This could be the difference between the two.

回答4:

While Wine is basically Windows, you're still comparing apples to oranges. As well, not only is it apples/oranges, the underlying vehicles hauling those apples and oranges around are completely different.

In short, your question could trivially be rephrased as "this code runs faster on Mac OSX than it does on Windows" and get the same answer.