why is strchr twice as fast as my simd code

I am learning SIMD and was curious to see whether it was possible to beat strchr at finding a character. It appears that strchr uses the same intrinsics but I assume that it checks for a null, whereas I know the character is in the array and plan on avoiding a null check.

My code is:

size_t N = 1e9;
bool found = false; //Not really used ...
size_t char_index1 = 0;
size_t char_index2 = 0;
char * str = malloc(N);
memset(str,'a',N);

__m256i char_match;
__m256i str_simd;
__m256i result;
__m256i* pSrc1;

int simd_mask;

str[(size_t)5e8] = 'b';


    char_match = _mm256_set1_epi8('b');
    result = _mm256_set1_epi32(0);

    simd_mask = 0;

    pSrc1 = (__m256i *)str;

    while (1){
        str_simd  = _mm256_lddqu_si256(pSrc1);
        result = _mm256_cmpeq_epi8(str_simd, char_match);
        simd_mask = _mm256_movemask_epi8(result);   
        if (simd_mask != 0){
            break;
        }
        pSrc1++;
    }

Full (not yet finished code) at: https://gist.github.com/JimHokanson/433e185ba53b41e49ce3ac804568ac1e

strchr is twice as fast as this code (using gcc and xcode). I'm hoping to understand why.

Update: compiling using: gcc -std=c11 -mavx2 -mlzcnt

标签： c simd

1条回答

Juvenile、少年°

2楼-- · 2019-03-04 05:16

I had not set an optimization flag in the compiler. Setting -O3 resulted in the SIMD code only taking 75% of the time of strchr.

Update: I should also clarify this is not a final working version of the code. There are still additional checks that need to be put in place and possible ways of optimizing the calls (I think). At least at this point though the code is in the ballpark of strchr. As pointed out in the question comments this version could read past a page and fault. Finally, this is mostly a SIMD learning opportunity (for myself), and memchr is probably your best bet (although I suspect you might be able to just slightly beat memchr if you have a sentinel buffer).

0人赞添加讨论(0) 举报

why is strchr twice as fast as my simd code

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间