is there any advantage to using this code
double x;
double square = pow(x,2);
instead of this?
double x;
double square = x*x;
I prefer x*x and looking at my implementation (Microsoft) I find no advantages in pow because x*x is simpler than pow for the particular square case.
Is there any particular case where pow is superior?
FWIW, with gcc-4.2 on MacOS X 10.6 and -O3
compiler flags,
x = x * x;
and
y = pow(y, 2);
result in the same assembly code:
#include <cmath>
void test(double& x, double& y) {
x = x * x;
y = pow(y, 2);
}
Assembles to:
pushq %rbp
movq %rsp, %rbp
movsd (%rdi), %xmm0
mulsd %xmm0, %xmm0
movsd %xmm0, (%rdi)
movsd (%rsi), %xmm0
mulsd %xmm0, %xmm0
movsd %xmm0, (%rsi)
leave
ret
So as long as you're using a decent compiler, write whichever makes more sense to your application, but consider that pow(x, 2)
can never be more optimal than the plain multiplication.
std::pow is more expressive if you mean x², xx is more expressive if you mean xx, especially if you are just coding down e.g. a scientific paper and readers should be able to understand your implementation vs. the paper. The difference is subtle maybe for x*x/x², but I think if you use named functions in general, it increases code-expessiveness and readability.
On modern compilers, like e.g. g++ 4.x, std::pow(x,2) will be inlined, if it is not even a compiler-builtin, and strength-reduced to x*x. If not by default and you don't care about IEEE floating type conformance, check your compiler's manual for a fast math switch (g++ == -ffast-math).
Sidenote: It has been mentioned that including math.h increases program size. My answer was:
In C++, you #include <cmath>
, not math.h. Also, if your compiler is not stone-old, it will increase your programs size only by what you are using (in the general case), and if your implentation of std::pow just inlines to corresponding x87 instructions, and a modern g++ will strength-reduce x² with x*x, then there is no relevant size-increase. Also, program size should never, ever dictate how expressive you make your code is.
A further advantage of cmath over math.h is that with cmath, you get a std::pow overload for each floating point type, whereas with math.h you get pow, powf, etc. in the global namespace, so cmath increase adaptibility of code, especially when writing templates.
As a general rule: Prefer expressive and clear code over dubiously grounded performance and binary size reasoned code.
See also Knuth:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"
and Jackson:
The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.
This question touches on one of the key weaknesses of most implementations of C and C++ regarding scientific programming. After having switched from Fortran to C about twenty years, and later to C++, this remains one of those sore spots that occasionally makes me wonder whether that switch was a good thing to do.
The problem in a nutshell:
- The easiest way to implement
pow
is Type pow(Type x; Type y) {return exp(y*log(x));}
- Most C and C++ compilers take the easy way out.
- Some might 'do the right thing', but only at high optimization levels.
- Compared to
x*x
, the easy way out with pow(x,2)
is extremely expensive computationally and loses precision.
Compare to languages aimed at scientific programming:
- You don't write
pow(x,y)
. These languages have a built-in exponentiation operator. That C and C++ have steadfastly refused to implement an exponentiation operator makes the blood of many scientific programmers programmers boil. To some diehard Fortran programmers, this alone is reason to never switch to C.
- Fortran (and other languages) are required to 'do the right thing' for all small integer powers, where small is any integer between -12 and 12. (The compiler is non-compliant if it can't 'do the right thing'.) Moreover, they are required to do so with optimization off.
- Many Fortran compilers also know how to extract some rational roots without resorting to the easy way out.
There is an issue with relying on high optimization levels to 'do the right thing'. I have worked for multiple organizations that have banned use of optimization in safety critical software. Memories can be very long (multiple decades long) after losing 10 million dollars here, 100 million there, all due to bugs in some optimizing compiler.
IMHO, one should never use pow(x,2)
in C or C++. I'm not alone in this opinion. Programmers who do use pow(x,2)
typically get reamed big time during code reviews.
In C++11 there is one case where there is an advantage to using x * x
over std::pow(x,2)
and that case is where you need to use it in a constexpr:
constexpr double mySqr( double x )
{
return x * x ;
}
As we can see std::pow is not marked constexpr and so it is unusable in a constexpr function.
Otherwise from a performance perspective putting the following code into godbolt shows these functions:
#include <cmath>
double mySqr( double x )
{
return x * x ;
}
double mySqr2( double x )
{
return std::pow( x, 2.0 );
}
generate identical assembly:
mySqr(double):
mulsd %xmm0, %xmm0 # x, D.4289
ret
mySqr2(double):
mulsd %xmm0, %xmm0 # x, D.4292
ret
and we should expect similar results from any modern compiler.
Worth noting that currently gcc considers pow a constexpr, also covered here but this is a non-conforming extension and should not be relied on and will probably change in later releases of gcc
.
x * x
will always compile to a simple multiplication. pow(x, 2)
is likely to, but by no means guaranteed, to be optimised to the same. If it's not optimised, it's likely using a slow general raise-to-power math routine. So if performance is your concern, you should always favour x * x
.
I would probably choose std::pow(x, 2)
because it could make my code refactoring easier. And it would make no difference whatsoever once the code is optimized.
Now, the two approaches are not identical. This is my test code:
#include<cmath>
double square_explicit(double x) {
asm("### Square Explicit");
return x * x;
}
double square_library(double x) {
asm("### Square Library");
return std::pow(x, 2);
}
The asm("text");
call simply writes comments to the assembly output, which I produce using (GCC 4.8.1 on OS X 10.7.4):
g++ example.cpp -c -S -std=c++11 -O[0, 1, 2, or 3]
You don't need -std=c++11
, I just always use it.
First: when debugging (with zero optimization), the assembly produced is different; this is the relevant portion:
# 4 "square.cpp" 1
### Square Explicit
# 0 "" 2
movq -8(%rbp), %rax
movd %rax, %xmm1
mulsd -8(%rbp), %xmm1
movd %xmm1, %rax
movd %rax, %xmm0
popq %rbp
LCFI2:
ret
LFE236:
.section __TEXT,__textcoal_nt,coalesced,pure_instructions
.globl __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
.weak_definition __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
__ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_:
LFB238:
pushq %rbp
LCFI3:
movq %rsp, %rbp
LCFI4:
subq $16, %rsp
movsd %xmm0, -8(%rbp)
movl %edi, -12(%rbp)
cvtsi2sd -12(%rbp), %xmm2
movd %xmm2, %rax
movq -8(%rbp), %rdx
movd %rax, %xmm1
movd %rdx, %xmm0
call _pow
movd %xmm0, %rax
movd %rax, %xmm0
leave
LCFI5:
ret
LFE238:
.text
.globl __Z14square_libraryd
__Z14square_libraryd:
LFB237:
pushq %rbp
LCFI6:
movq %rsp, %rbp
LCFI7:
subq $16, %rsp
movsd %xmm0, -8(%rbp)
# 9 "square.cpp" 1
### Square Library
# 0 "" 2
movq -8(%rbp), %rax
movl $2, %edi
movd %rax, %xmm0
call __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
movd %xmm0, %rax
movd %rax, %xmm0
leave
LCFI8:
ret
But when you produce the optimized code (even at the lowest optimization level for GCC, meaning -O1
) the code is just identical:
# 4 "square.cpp" 1
### Square Explicit
# 0 "" 2
mulsd %xmm0, %xmm0
ret
LFE236:
.globl __Z14square_libraryd
__Z14square_libraryd:
LFB237:
# 9 "square.cpp" 1
### Square Library
# 0 "" 2
mulsd %xmm0, %xmm0
ret
So, it really makes no difference unless you care about the speed of unoptimized code.
Like I said: it seems to me that std::pow(x, 2)
more clearly conveys your intentions, but that is a matter of preference, not performance.
And the optimization seems to hold even for more complex expressions. Take, for instance:
double explicit_harder(double x) {
asm("### Explicit, harder");
return x * x - std::sin(x) * std::sin(x) / (1 - std::tan(x) * std::tan(x));
}
double implicit_harder(double x) {
asm("### Library, harder");
return std::pow(x, 2) - std::pow(std::sin(x), 2) / (1 - std::pow(std::tan(x), 2));
}
Again, with -O1
(the lowest optimization), the assembly is identical yet again:
# 14 "square.cpp" 1
### Explicit, harder
# 0 "" 2
call _sin
movd %xmm0, %rbp
movd %rbx, %xmm0
call _tan
movd %rbx, %xmm3
mulsd %xmm3, %xmm3
movd %rbp, %xmm1
mulsd %xmm1, %xmm1
mulsd %xmm0, %xmm0
movsd LC0(%rip), %xmm2
subsd %xmm0, %xmm2
divsd %xmm2, %xmm1
subsd %xmm1, %xmm3
movapd %xmm3, %xmm0
addq $8, %rsp
LCFI3:
popq %rbx
LCFI4:
popq %rbp
LCFI5:
ret
LFE239:
.globl __Z15implicit_harderd
__Z15implicit_harderd:
LFB240:
pushq %rbp
LCFI6:
pushq %rbx
LCFI7:
subq $8, %rsp
LCFI8:
movd %xmm0, %rbx
# 19 "square.cpp" 1
### Library, harder
# 0 "" 2
call _sin
movd %xmm0, %rbp
movd %rbx, %xmm0
call _tan
movd %rbx, %xmm3
mulsd %xmm3, %xmm3
movd %rbp, %xmm1
mulsd %xmm1, %xmm1
mulsd %xmm0, %xmm0
movsd LC0(%rip), %xmm2
subsd %xmm0, %xmm2
divsd %xmm2, %xmm1
subsd %xmm1, %xmm3
movapd %xmm3, %xmm0
addq $8, %rsp
LCFI9:
popq %rbx
LCFI10:
popq %rbp
LCFI11:
ret
Finally: the x * x
approach does not require include
ing cmath
which would make your compilation ever so slightly faster all else being equal.