I wish to target ARMv6 with VFP Android device.
I have following line in my Android.mk
file to enable VFP
LOCAL_CFLAGS := -marm -mfloat-abi=softfp -mfpu=vfp -Wmultichar
I believe I target ARMv5
with VFP
.
I edited android-ndk-r8b\toolchains\arm-linux-androideabi-4.6\setup.mk
to remove -msoft-float
. I also tried with original setup.mk
My code works fine 99.99% of time but some times goes crazy on ARMv6 devices.
I have special code to detect when it goes crazy.
Code
glm::vec3 D = P1 - P2;
float f1 = sqrtf(D.x*D.x + D.y*D.y + D.z*D.z);
if(!(f1 < 5)){
// f1 is bigger then 5 or NaN
mylog_fmt("Crazy %f %f %f %f", P1.x, P1.y, P1.z, f1);
mylog_fmt("%f %f %f", P2.x, P2.y, P2.z);
}
LogCat:
12-14 00:59:08.214: I/APP(17091): Crazy -20.000031 0.000000 0.000000 20.000000
12-14 00:59:08.214: I/APP(17091): -20.000000 0.000000 0.000000
It calculates distance between 2 points. Usually it is 0.000031
But when crazy mode
is on it is 20.0
The problem does not exists when I run it on ARMv7 CPU. It exists on ARMv6 CPU only.
I believe it should be some common known bug related to compiler settings or version. May be codes is missing memory barrier.
I would like to see some reference to similar bugs. Way to solve it. Or about nature of bug.
I also often get NaN values on ARMv6 when same code on ARMv7 does not give NaN.
I am debugging code for for 2 weeks already and searching the web. If someone could share link to similar problem it would be a great help!
PS. here is example of one of compile commands. I tried many different settings already.
Compiler Settings
c:/soft/Android/android-ndk-r8b/toolchains/arm-linux-androideabi-4.6/prebuilt/windows/bin/arm-linux-androideabi-g++
-MMD -MP -MF ./obj/local/armeabi/objs/main/sys/base.o.d -fpic -ffunction-sections -funwind-tables -fstack-protector
-D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ -D__ARM_ARCH_5E__
-D__ARM_ARCH_5TE__
-march=armv5te -mtune=arm6
-mfloat-abi=softfp -mfpu=vfp
-fno-exceptions -fno-rtti -mthumb -Os -fomit-frame-pointer -fno-strict-aliasing -finline-limit=64
-Ijni/main/ -Ijni/main/sys -Ijni/main/bullet/src -Ijni/main/bullet/src/LinearMath -Ijni/main/bullet/src/BulletCollision/BroadphaseCollision
-Ijni/main/bullet/src/BulletCollision/CollisionDispatch -Ijni/main/bullet/src/BulletCollision/CollisionShapes -Ijni/main/bullet/src/BulletCollision/NarrowPhaseCollision
-Ijni/main/bullet/src/BulletDynamics/ConstraintSolver -Ijni/main/bullet/src/BulletDynamics/Dynamics -Ijni/main/../libzip/ -Ic:/soft/Android/android-ndk-r8b/sources/cxx-stl/stlport/stlport
-Ic:/soft/Android/android-ndk-r8b/sources/cxx-stl//gabi++/include -Ijni/main
-DANDROID
-marm -march=armv6 -mfloat-abi=softfp -mfpu=vfp -Wmultichar
-Wa,--noexecstack -frtti -O2 -DNDEBUG -g -Ic:/soft/Android/android-ndk-r8b/platforms/android-5/arch-arm/usr/include -c jni/main/sys/base.cpp
-o ./obj/local/armeabi/objs/main/sys/base.o
UPDATE 2
All these devices have Qualcomm MSM7227A
It has ARM1136JF-S
What I learnt so far is that the bug could relate to de-norms
I read somewhere ARMv7 differences WITH ARMv6 that is has denorms
flush to zero by default and ARM1136SF-S has it optionally.
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0211k/DDI0211K_arm1136_r1p5_trm.pdf
Not yet sure how to verify that Flush-To-ZERO flag on ARM.
UPDATE 3
This CPU's VFP is called VFP11
I found --vfp11-denorm-fix
option.
There is also --vfp-denorm-fix
They correct erratum in VFP11
cpus. Looks like my target problem.
Found few posts about VFP11 erratum. Hope it will fix the code.
Seems like I identified bug.
It is bug in VFP11 (ARMv6 coprocessor) denorm bug.
denormal numbers are very small number.
I get this numbers in physics code implementing spring with dumping
force1 = (Center - P1) * k1 // force1 directed to center
force2 = - Velocity * k2 // force2 directed against velocity
Object->applyForce(force1)
Object->applyForce(force2)
Both forces get very small when object archieve Center
and I get denormal
values at the end.
I can re-write sring and dumping but I can't re-write hole BulletPhysics or all math code and predict every (even internal) occurance of denormal number.
Linker has fix code options --vfp11-denorm-fix
and --vfp-denorm-fix
http://sourceware.org/binutils/docs-2.19/ld/ARM.html
NDK linker has --vfp11-denorm-fix
This option helps. Code looks more repliable but it does not fix problem for 100%.
I see less bugs now.
BUt if I wait sping stabilize object then I finally I get denorm -> NaN
I have to wait longer but same problems arrive.
If you know solution that will fix code like --vfp11-denorm-fix
should then I give you bounty.
I tried both --vfp11-denorm-fix=scalar
and --vfp11-denorm-fix=vector
Flush to Zero bit
int x;
// compiles in ARM mode
asm(
"vmrs %[result],FPSCR \r\n"
"orr %[result],%[result],#16777216 \r\n"
"vmsr FPSCR,%[result]"
:[result] "=r" (x) : :
);
Not sure why but it requires LOCAL_ARM_MODE := arm
in Android.mk
May be -mfpu=vfp-d16
instead of of just vfp
is required.
Manually clear denormal numbers
I have spring code described above.
I improved it by clearing denormal number manually without using FPU with following function.
inline void fixDenorm(float & f){
union FloatInt32 {
unsigned int u32;
float f32;
};
FloatInt32 fi;
fi.f32 = f;
unsigned int exponent = (fi.u32 >> 23) & ((1 << 8) - 1);
if(exponent == 0)
f = 0.f;
}
Original code was failing in 15-90 seconds from start in many places.
Current code showed issue possibly related to this bug in only one in place after 10 minutes of physics simulation.
Reference to bug and fix
http://sourceware.org/ml/binutils/2006-12/msg00196.html
They say that GCC
uses only scalr code and --vfp11-denorm-fix=scalar
is enough.
It adds 1 extra command to slow down. But even --vfp11-denorm-fix=vector
that adds 2 extra commands is not enough.
Problem is not easier re-producible. On phones with higher frequency 800Mhz I see it more often then on slower one 600Mhz. It is possible that fix was done when there was no fast CPUs on market.
We have many files in project and every configuration compilations takes around 10 minutes.
Testing with current state of fix requires ~10 minutes to play on phone. + We heat phone under the lamp. Hot phone shows errors faster.
I wish to test different configurations and report what fix is most efficient. But right now we have to add hack to kill last bug possibly related to denorms.
I expected to find silver bullet that will fix it but only -msoft-float
with 10x performance degradation or running app on ARMv7 does it.
After I replaced previous fixDenorm
function with new fixDenormE
in spring/dumping code and applying the new function for ViewMatrix I get rid of last bug.
inline void fixDenormE(float & f, float epsilon = 1e-8){
union Data32 {
unsigned int u32;
float f32;
};
Data32 d;
d.f32 = f;
unsigned int exponent = (d.u32 >> 23) & ((1 << 8) - 1);
if(exponent == 0)
f = 0.f;
if(fabsf(f) < epsilon){
f = 0.f;
}
}
This page has an interesting discussion on ARM FPU options: VfpComparison
I think if you want to build for ARM v6, you might do this: -march=armv6 -mcpu=generic-armv6 -mfloat-abi=softfp
(and leave out the -mfpu option). If you are not targetting specifically the processor you mentioned above, generic armv6 doesn't have a guaranteed fpu.
Another option is to try -mfloat-abi=hard
, on the theory that there is a compiler bug somewhere around softfp.
Also check for any stack corruption etc in your code, it is possible that when floating point values are passed you clobber them.
P.S. You might also want to try out a floating-point tester such as TestFloat or the venerable netlib paranoia. While you have an example of floating point failing on this particular processor and with these compiler options, you don't know how widespread a problem it is. It could be worse than you think :)