ARM NEON vectorization failure

2019-03-30 01:26发布

I would like to enable NEON vectorization on my ARM cortex-a9, but I get this output at compile:

"not vectorized: relevant stmt not supported: D.14140_82 = D.14143_77 * D.14141_81"

Here is my loop:

void my_mul(float32_t * __restrict data1, float32_t * __restrict data2, float32_t * __restrict out){    
    for(int i=0; i<SIZE*4; i+=1){
        out[i] = data1[i]*data2[i];
    }
}

And the options used at compile:

-march=armv7-a -mcpu=cortex-a9 -mfpu=neon -mfloat-abi=softfp -ftree-vectorize -mvectorize-with-neon-quad -ftree-vectorizer-verbose=2

I am using arm-linux-gnueabi (v4.6 ) compiler.

It is important to note that the problem only appears with float32 vectors. If I switch in int32, then the vectorization is done. Maybe the vectorization for float32 is not yet available…

Does anyone has an idea ? Do I forget something in the cmd line or in my implementation ?

Thanks in advance for your help.

Guix

标签： compiler-construction arm vectorization neon

1条回答

▲ chillily

2楼-- · 2019-03-30 01:55

From GCC's ARM options page

-mfpu=name

...

If the selected floating-point hardware includes the NEON extension (e.g. -mfpu=`neon'), note that floating-point operations are not generated by GCC's auto-vectorization pass unless -funsafe-math-optimizations is also specified. This is because NEON hardware does not fully implement the IEEE 754 standard for floating-point arithmetic (in particular denormal values are treated as zero), so the use of NEON instructions may lead to a loss of precision.

If you specify -funsafe-math-optimizations it should work, but reread the note above if you are going to use this with high precision.

0人赞添加讨论(0) 举报

ARM NEON vectorization failure

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间