Better to compare slices or bytes?

2020-08-07 16:53发布

问题:

I'm just curious on which of these methods is better (or if there's an even better one that I'm missing). I'm trying to determine if the first letter and last letter of a word are the same, and there are two obvious solutions to me.

if word[:1] == word[len(word)-1:]

or

if word[0] == word[len(word)-1]

As I understand it, the first is just pulling slices of the string and doing a string comparison, while the second is pulling the character from either end and comparing as bytes.

I'm curious if there's a performance difference between the two, and if there's any "preferable" way to do this?

回答1:

If by letter you mean rune, then use:

func eqRune(s string) bool {
    if s == "" {
        return false // or true if that makes more sense for the app
    }
    f, _ := utf8.DecodeRuneInString(s)  // 2nd return value is rune size. ignore it.
    l, _ := utf8.DecodeLastRuneInString(s) // 2nd return value is rune size. ignore it.
    if f != l {
        return false
    }
    if f == unicode.ReplacementChar {
        // First and last are invalid UTF-8. Fallback to 
        // comparing bytes.
        return s[0] == s[len(s)-1]
    }
    return true
}

If you mean byte, then use:

func eqByte(s string) bool {
    if s == "" {
        return false // or true if that makes more sense for the app
    }
    return s[0] == s[len(s)-1]
}

Comparing individual bytes is faster than comparing string slices as shown by the benchmark in another answer.

playground example



回答2:

In Go, strings are UTF-8 encoded. UTF-8 is a variable-length encoding.

package main

import "fmt"

func main() {
    word := "世界世"
    fmt.Println(word[:1] == word[len(word)-1:])
    fmt.Println(word[0] == word[len(word)-1])
}

Output:

false
false

If you really want to compare a byte, not a character, then be as precise as possible for the compiler. Obviously, compare a byte, not a slice.

BenchmarkSlice-4    200000000            7.55 ns/op
BenchmarkByte-4     2000000000           1.08 ns/op

package main

import "testing"

var word = "word"

func BenchmarkSlice(b *testing.B) {
    for i := 0; i < b.N; i++ {
        if word[:1] == word[len(word)-1:] {
        }
    }
}

func BenchmarkByte(b *testing.B) {
    for i := 0; i < b.N; i++ {
        if word[0] == word[len(word)-1] {
        }
    }
}


回答3:

A string is a sequence of bytes. Your method works if you know the string contains only ASCII characters. Otherwise, you should use a method that handles multibyte characters instead of string indexing. You can convert it to a rune slice to process code points or characters, like this:

    r := []rune(s)
    return r[0] == r[len(r) - 1]

You can read more about strings, byte slices, runes, and code points in the official Go Blog post on the subject.

To answer your question, there's no significant performance difference between the two index expressions you posted.

Here's a runnable example:

package main

import "fmt"

func EndsMatch(s string) bool {
    r := []rune(s)
    return r[0] == r[len(r) - 1]
}

func main() {
    tests := []struct{
        s   string
        e   bool
    }{
        {"foo", false},
        {"eve", true},
        {"世界世", true},
    }
    for _, t := range tests {
        r := EndsMatch(t.s)
        if r != t.e {
            fmt.Printf("EndsMatch(%s) failed: expected %t, got %t\n", t.s, t.e, r)
        }
    }
}

Prints nothing.



标签: go