TL;DR Is the optimizer of VS2013 confused or are my measurements wrong or does the global dummy in fact need to be volatile to make the test valid or ____ ?
Disclaimer: This is mostly out of "academic" interest, I would not expect the differences I see to really affect any production code.
Intro: Some recent measurements of mine then led me to this question because I saw significant differences between std::vector<std::unique_ptr<T> >
and boost::ptr_vector
on VS2013. (also see comments there)
It would appear that for my specific test case, accessing elements in a boost::ptr_vector can be 50% faster than using a vector of unique_ptr!
My Test code is here: http://coliru.stacked-crooked.com/a/27dc2f1b91380cca (I'll refrain from also including it in this question, I'll include snippets below)
gcc 4.8 doesn't report any differences, so this is a VS2013 thing.
Start... The timings are as follows for accessing all (1000000) elements 200 times: * St6vectorISt10unique_ptrIjSt14default_deleteIjEESaIS3_EE: 1764 ms * N5boost10ptr_vectorIjNS_20heap_clone_allocatorESaIPvEEE: 1781 ms Dummy output: 500000
My Timings for exactly the test code linked to are:
Start... The timings are as follows for accessing all (1.000.000) elements 200 times: * class std::vector<....>: 344 ms * class boost::ptr_vector<unsigned int,....>: 216 ms Dummy output: 500.000
The test loop looks like this, I'll also keep the lengthy comment there which explains what I see:
template<typename C>
void RunContainerAccess(C& c) {
for (size_t i = 0; i != loop_count; ++i) {
for (auto& e : c) {
// This is relevant:
// If the if-condition is present, VC++2013 will show
// approx. the same runtime for both cases. However,
// if the only line in this loop is assigning the element
// to the pGlobalDummy, *then* ptr_vector will run approx. 50%
// faster than the unique_vector version!
//
// g++-4.8 does not show this behaviour
//
// Note: VS2013 commmand line: (release; /O2; no whole prg opt)
// /GS /analyze- /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"Release\vc120.pdb" /fp:precise /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_LIB" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oy- /Oi /MD /Fa"Release\" /EHsc /nologo /Fo"Release\" /Fp"Release\simple.pch"
//
// Note: http://coliru.stacked-crooked.com/ command line:
// g++-4.8 -std=c++11 -O2 -Wall -pedantic -pthread main.cpp && ./a.out
// if (pGlobalDummy)
pGlobalDummy = PtrShim(e);
}
}
}
If the only line in the loop is accessing the element (putting the ptr into a global dummy), then it would appear that the VS2013 optimizer does something weird. When the if (pGlobalDummy)
is present, both cases are the same.
Can anyone share some light on this?
Thanks to Howard's answer I did find that adding volatile
to the global dummy make a difference, i.e. when the global dumy is volatile like so:
extern MyType* volatile pGlobalDummy;
MyType* volatile pGlobalDummy = nullptr;
The loops run a bit slower, but run exactly the same. Should volatile make a difference here? That is, is the test even valid without volatile?