Android NDK: ARMv6 + VFP devices. wrong calculatio

2019-04-13 23:46发布

站内文章 / 前端开发

81 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I wish to target ARMv6 with VFP Android device.

I have following line in my Android.mk file to enable VFP

LOCAL_CFLAGS    := -marm -mfloat-abi=softfp -mfpu=vfp -Wmultichar

I believe I target ARMv5 with VFP.

I edited android-ndk-r8b\toolchains\arm-linux-androideabi-4.6\setup.mk to remove -msoft-float. I also tried with original setup.mk

My code works fine 99.99% of time but some times goes crazy on ARMv6 devices. I have special code to detect when it goes crazy.

Code

glm::vec3 D = P1 - P2;
float f1 = sqrtf(D.x*D.x + D.y*D.y + D.z*D.z);
if(!(f1 < 5)){
    // f1 is bigger then 5 or NaN
    mylog_fmt("Crazy %f %f %f %f", P1.x, P1.y, P1.z, f1);
    mylog_fmt("%f %f %f", P2.x, P2.y, P2.z);
}

LogCat:

12-14 00:59:08.214: I/APP(17091): Crazy -20.000031 0.000000 0.000000 20.000000
12-14 00:59:08.214: I/APP(17091): -20.000000 0.000000 0.000000

It calculates distance between 2 points. Usually it is 0.000031 But when crazy mode is on it is 20.0

The problem does not exists when I run it on ARMv7 CPU. It exists on ARMv6 CPU only.

I believe it should be some common known bug related to compiler settings or version. May be codes is missing memory barrier.

I would like to see some reference to similar bugs. Way to solve it. Or about nature of bug.

I also often get NaN values on ARMv6 when same code on ARMv7 does not give NaN.

I am debugging code for for 2 weeks already and searching the web. If someone could share link to similar problem it would be a great help!

PS. here is example of one of compile commands. I tried many different settings already.

Compiler Settings

c:/soft/Android/android-ndk-r8b/toolchains/arm-linux-androideabi-4.6/prebuilt/windows/bin/arm-linux-androideabi-g++
-MMD -MP -MF ./obj/local/armeabi/objs/main/sys/base.o.d -fpic -ffunction-sections -funwind-tables -fstack-protector 
-D__ARM_ARCH_5__ -D__ARM_ARCH_5T__ -D__ARM_ARCH_5E__
-D__ARM_ARCH_5TE__  
-march=armv5te -mtune=arm6 
-mfloat-abi=softfp -mfpu=vfp
-fno-exceptions -fno-rtti -mthumb -Os -fomit-frame-pointer -fno-strict-aliasing -finline-limit=64 
-Ijni/main/ -Ijni/main/sys -Ijni/main/bullet/src -Ijni/main/bullet/src/LinearMath -Ijni/main/bullet/src/BulletCollision/BroadphaseCollision 
-Ijni/main/bullet/src/BulletCollision/CollisionDispatch -Ijni/main/bullet/src/BulletCollision/CollisionShapes -Ijni/main/bullet/src/BulletCollision/NarrowPhaseCollision 
-Ijni/main/bullet/src/BulletDynamics/ConstraintSolver -Ijni/main/bullet/src/BulletDynamics/Dynamics -Ijni/main/../libzip/ -Ic:/soft/Android/android-ndk-r8b/sources/cxx-stl/stlport/stlport 
-Ic:/soft/Android/android-ndk-r8b/sources/cxx-stl//gabi++/include -Ijni/main 
-DANDROID

-marm -march=armv6 -mfloat-abi=softfp -mfpu=vfp -Wmultichar

-Wa,--noexecstack  -frtti  -O2 -DNDEBUG -g   -Ic:/soft/Android/android-ndk-r8b/platforms/android-5/arch-arm/usr/include -c  jni/main/sys/base.cpp
-o ./obj/local/armeabi/objs/main/sys/base.o

UPDATE 2

All these devices have Qualcomm MSM7227A It has ARM1136JF-S

What I learnt so far is that the bug could relate to de-norms I read somewhere ARMv7 differences WITH ARMv6 that is has denorms flush to zero by default and ARM1136SF-S has it optionally. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0211k/DDI0211K_arm1136_r1p5_trm.pdf

Not yet sure how to verify that Flush-To-ZERO flag on ARM.

UPDATE 3

This CPU's VFP is called VFP11 I found --vfp11-denorm-fix option. There is also --vfp-denorm-fix They correct erratum in VFP11 cpus. Looks like my target problem. Found few posts about VFP11 erratum. Hope it will fix the code.

回答1:

Seems like I identified bug.

It is bug in VFP11 (ARMv6 coprocessor) denorm bug. denormal numbers are very small number.

I get this numbers in physics code implementing spring with dumping

force1 = (Center - P1) * k1         // force1 directed to center 
force2 = - Velocity * k2            // force2 directed against velocity
Object->applyForce(force1)
Object->applyForce(force2)

Both forces get very small when object archieve Center and I get denormal values at the end.

I can re-write sring and dumping but I can't re-write hole BulletPhysics or all math code and predict every (even internal) occurance of denormal number.

Linker has fix code options --vfp11-denorm-fix and --vfp-denorm-fix http://sourceware.org/binutils/docs-2.19/ld/ARM.html

NDK linker has --vfp11-denorm-fix This option helps. Code looks more repliable but it does not fix problem for 100%.

I see less bugs now.

BUt if I wait sping stabilize object then I finally I get denorm -> NaN

I have to wait longer but same problems arrive.

If you know solution that will fix code like --vfp11-denorm-fix should then I give you bounty.

I tried both --vfp11-denorm-fix=scalar and --vfp11-denorm-fix=vector

Flush to Zero bit

      int x;
      // compiles in ARM mode
      asm(
              "vmrs %[result],FPSCR \r\n"
              "orr %[result],%[result],#16777216 \r\n"
              "vmsr FPSCR,%[result]"
              :[result] "=r" (x) : :
      );

Not sure why but it requires LOCAL_ARM_MODE := arm in Android.mk May be -mfpu=vfp-d16 instead of of just vfp is required.

Manually clear denormal numbers

I have spring code described above. I improved it by clearing denormal number manually without using FPU with following function.

inline void fixDenorm(float & f){
    union FloatInt32 {
        unsigned int u32;
        float f32;
    };
        FloatInt32 fi;
        fi.f32 = f;

        unsigned int exponent = (fi.u32 >> 23) & ((1 << 8) - 1);
        if(exponent == 0)
            f = 0.f;
}

Original code was failing in 15-90 seconds from start in many places.

Current code showed issue possibly related to this bug in only one in place after 10 minutes of physics simulation.

Reference to bug and fix http://sourceware.org/ml/binutils/2006-12/msg00196.html

They say that GCC uses only scalr code and --vfp11-denorm-fix=scalar is enough. It adds 1 extra command to slow down. But even --vfp11-denorm-fix=vector that adds 2 extra commands is not enough.

Problem is not easier re-producible. On phones with higher frequency 800Mhz I see it more often then on slower one 600Mhz. It is possible that fix was done when there was no fast CPUs on market.

We have many files in project and every configuration compilations takes around 10 minutes. Testing with current state of fix requires ~10 minutes to play on phone. + We heat phone under the lamp. Hot phone shows errors faster.

I wish to test different configurations and report what fix is most efficient. But right now we have to add hack to kill last bug possibly related to denorms.

I expected to find silver bullet that will fix it but only -msoft-float with 10x performance degradation or running app on ARMv7 does it.

After I replaced previous fixDenorm function with new fixDenormE in spring/dumping code and applying the new function for ViewMatrix I get rid of last bug.

inline void fixDenormE(float & f, float epsilon = 1e-8){
    union Data32 {
        unsigned int u32;
        float f32;
    };
        Data32 d;
        d.f32 = f;

        unsigned int exponent = (d.u32 >> 23) & ((1 << 8) - 1);
        if(exponent == 0)
            f = 0.f;
        if(fabsf(f) < epsilon){
          f = 0.f;
        }
}

回答2:

This page has an interesting discussion on ARM FPU options: VfpComparison

I think if you want to build for ARM v6, you might do this: -march=armv6 -mcpu=generic-armv6 -mfloat-abi=softfp (and leave out the -mfpu option). If you are not targetting specifically the processor you mentioned above, generic armv6 doesn't have a guaranteed fpu.

Another option is to try -mfloat-abi=hard, on the theory that there is a compiler bug somewhere around softfp.

Also check for any stack corruption etc in your code, it is possible that when floating point values are passed you clobber them.

P.S. You might also want to try out a floating-point tester such as TestFloat or the venerable netlib paranoia. While you have an example of floating point failing on this particular processor and with these compiler options, you don't know how widespread a problem it is. It could be worse than you think :)