How the glibc strlen() implementation works [dupli

This question already has an answer here:

strlen performance implementation 3 answers

The strlen() from K&R takes only a few lines.

int strlen(char *s)
{
    char *p = s;
    while (*p != '\0')
        p++;
    return p - s;
}

But the glibc version is much longer. For simplicity, I removed all the comments and the 64-bit implementation, the extracted version strlen() looks like this:

size_t strlen(const char *str)
{
    const char *char_ptr;
    const unsigned long int *longword_ptr;
    unsigned long int longword, magic_bits, himagic, lomagic;

    for (char_ptr = str; ((unsigned long int) char_ptr 
             & (sizeof (longword) - 1)) != 0; ++char_ptr)
       if (*char_ptr == '\0')
           return char_ptr - str;

    longword_ptr = (unsigned long int *) char_ptr;

    himagic = 0x80808080L;
    lomagic = 0x01010101L;

    for (;;)
    { 
        longword = *longword_ptr++;

        if (((longword - lomagic) & himagic) != 0)
        {
            const char *cp = (const char *) (longword_ptr - 1);

            if (cp[0] == 0)
                return cp - str;
            if (cp[1] == 0)
                return cp - str + 1;
            if (cp[2] == 0)
                return cp - str + 2;
            if (cp[3] == 0)
                return cp - str + 3;
        }
    }
}

With the help of the very helpful comment (click here), I got most of how this works. Instead of checking for '\0' byte by byte, the glibc strlen() checks every word(4 bytes in 32-bit machine, 8 byte in 64-bit machine). In this way, when the string is relatively long, the performance can be improved.

It checks the first few characters by reading byte by byte, until char_ptr is aligned on a longword boundary. Then it uses a loop to check if the longword has any bytes with of all-zeros. If there is, check which byte, and return the result.

The part I don't get is, how does this check if one byte in longword is all-zeros?

if (((longword - lomagic) & himagic) != 0)

I can build a longword value of 0x81818181, which can make 0x81818181 - 0x01010101) & 0x80808080 not equal to 0, but there are no all-zero bytes.

Is this related with the fact that ASCII values range from 0 to 127, so 0x81 isn't valid ASCII? But I don't think the C standard forces strings to use ASCII.

I figured it out. Can't believe it's that simple, I spent the last half an hour on it.

It's OK that the check

if (((longword - lomagic) & himagic) != 0)

lets values like 0x81818181 pass, because if it passes, the following test on every byte would not return since there are no all-zero byte. So the loop can continue to test the next longword.

The algorithm behind the check is based on Determine if a word has a zero byte

unsigned int v; 
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

In 2's complement, - 0x01010101 has the same effect with + 0xFEFEFEFF. The difference is because glibc doesn't have v & 0x7F7F7F7F, which makes sure the bytes in the word has the most significant bit of 0. This prevents examples like 0x81818181, but glibc omits it because it doesn't have to let it pass as stated earlier, The check is correct as long as it won't miss any word that does have all-zero bytes.