How to efficiently concatenate strings in Go?

2019-01-05 06:20发布

In Go, a string is a primitive type, which means it is read-only, and every manipulation of it will create a new string.

So if I want to concatenate strings many times without knowing the length of the resulting string, what's the best way to do it?

The naive way would be:

s := ""
for i := 0; i < 1000; i++ {
    s += getShortStringFromSomewhere()
}
return s

but that does not seem very efficient.

20条回答
走好不送
2楼-- · 2019-01-05 07:00

strings.Join() from the "strings" package

If you have a type mismatch(like if you are trying to join an int and a string), you do RANDOMTYPE (thing you want to change)

EX:

package main

import "strings"

var intEX = 0
var stringEX = "hello all you "
var stringEX2 = " people in here"

func main() {
    strings.Join(stringEX, string(intEX), stringEX2)
}
查看更多
狗以群分
3楼-- · 2019-01-05 07:03
s := fmt.Sprintf("%s%s", []byte(s1), []byte(s2))
查看更多
趁早两清
4楼-- · 2019-01-05 07:04

Beginning with Go 1.10 there is a strings.Builder, here.

A Builder is used to efficiently build a string using Write methods. It minimizes memory copying. The zero value is ready to use.


Usage:

It's almost the same with bytes.Buffer.

package main

import (
    "strings"
    "fmt"
)

func main() {
    var str strings.Builder

    for i := 0; i < 1000; i++ {
        str.WriteString("a")
    }

    fmt.Println(str.String())
}

StringBuilder methods and interfaces it supports:

Its methods are being implemented with the existing interfaces in mind so that you can switch to the new Builder easily in your code.


Zero value usage:

var sb strings.Builder

Differences from bytes.Buffer:

  • It can only grow or reset.

  • In bytes.Buffer underlying bytes can escape like this: (*Buffer).Bytes(); strings.Builder prevents this problem.

  • It also has a copyCheck mechanism inside which prevents accidentially copying it (func (b *Builder) copyCheck() { ... }).


Check out its source code here.

查看更多
孤傲高冷的网名
5楼-- · 2019-01-05 07:04

I just benchmarked the top answer posted above in my own code (a recursive tree walk) and the simple concat operator is actually faster than the BufferString.

func (r *record) String() string {
    buffer := bytes.NewBufferString("");
    fmt.Fprint(buffer,"(",r.name,"[")
    for i := 0; i < len(r.subs); i++ {
        fmt.Fprint(buffer,"\t",r.subs[i])
    }
    fmt.Fprint(buffer,"]",r.size,")\n")
    return buffer.String()
}

This took 0.81 seconds, whereas the following code:

func (r *record) String() string {
    s := "(\"" + r.name + "\" ["
    for i := 0; i < len(r.subs); i++ {
        s += r.subs[i].String()
    }
    s += "] " + strconv.FormatInt(r.size,10) + ")\n"
    return s
} 

only took 0.61 seconds. This is probably due to the overhead of creating the new BufferString.

Update: I also benchmarked the join function and it ran in 0.54 seconds.

func (r *record) String() string {
    var parts []string
    parts = append(parts, "(\"", r.name, "\" [" )
    for i := 0; i < len(r.subs); i++ {
        parts = append(parts, r.subs[i].String())
    }
    parts = append(parts, strconv.FormatInt(r.size,10), ")\n")
    return strings.Join(parts,"")
}
查看更多
欢心
6楼-- · 2019-01-05 07:04

Update 2018-04-03

As of Go 1.10, string.Builder is recommended to be a replacement for bytes.Buffer. Check 1.10 release notes

A new type Builder is a replacement for bytes.Buffer for the use case of accumulating text into a string result. The Builder's API is a restricted subset of bytes.Buffer's that allows it to safely avoid making a duplicate copy of the data during the String method.

============================================================

The benchmark code of @cd1 and other answers are wrong. b.N is not supposed to be set in benchmark function. It's set by the go test tool dynamically to determine if the execution time of the test is stable.

A benchmark function should run the same test b.N times and the test inside the loop should be the same for each iteration. So I fix it by adding an inner loop. I also add benchmarks for some other solutions:

package main

import (
    "bytes"
    "strings"
    "testing"
)

const (
    sss = "xfoasneobfasieongasbg"
    cnt = 10000
)

var (
    bbb      = []byte(sss)
    expected = strings.Repeat(sss, cnt)
)

func BenchmarkCopyPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        bs := make([]byte, cnt*len(sss))
        bl := 0
        for i := 0; i < cnt; i++ {
            bl += copy(bs[bl:], sss)
        }
        result = string(bs)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkAppendPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, cnt*len(sss))
        for i := 0; i < cnt; i++ {
            data = append(data, sss...)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferPreAllocate(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        buf := bytes.NewBuffer(make([]byte, 0, cnt*len(sss)))
        for i := 0; i < cnt; i++ {
            buf.WriteString(sss)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkCopy(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, 64) // same size as bootstrap array of bytes.Buffer
        for i := 0; i < cnt; i++ {
            off := len(data)
            if off+len(sss) > cap(data) {
                temp := make([]byte, 2*cap(data)+len(sss))
                copy(temp, data)
                data = temp
            }
            data = data[0 : off+len(sss)]
            copy(data[off:], sss)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkAppend(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        data := make([]byte, 0, 64)
        for i := 0; i < cnt; i++ {
            data = append(data, sss...)
        }
        result = string(data)
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferWrite(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var buf bytes.Buffer
        for i := 0; i < cnt; i++ {
            buf.Write(bbb)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkBufferWriteString(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var buf bytes.Buffer
        for i := 0; i < cnt; i++ {
            buf.WriteString(sss)
        }
        result = buf.String()
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

func BenchmarkConcat(b *testing.B) {
    var result string
    for n := 0; n < b.N; n++ {
        var str string
        for i := 0; i < cnt; i++ {
            str += sss
        }
        result = str
    }
    b.StopTimer()
    if result != expected {
        b.Errorf("unexpected result; got=%s, want=%s", string(result), expected)
    }
}

Environment is OS X 10.11.6, 2.2 GHz Intel Core i7

Test results:

BenchmarkCopyPreAllocate-8         20000             84208 ns/op          425984 B/op          2 allocs/op
BenchmarkAppendPreAllocate-8       10000            102859 ns/op          425984 B/op          2 allocs/op
BenchmarkBufferPreAllocate-8       10000            166407 ns/op          426096 B/op          3 allocs/op
BenchmarkCopy-8                    10000            160923 ns/op          933152 B/op         13 allocs/op
BenchmarkAppend-8                  10000            175508 ns/op         1332096 B/op         24 allocs/op
BenchmarkBufferWrite-8             10000            239886 ns/op          933266 B/op         14 allocs/op
BenchmarkBufferWriteString-8       10000            236432 ns/op          933266 B/op         14 allocs/op
BenchmarkConcat-8                     10         105603419 ns/op        1086685168 B/op    10000 allocs/op

Conclusion:

  1. CopyPreAllocate is the fastest way; AppendPreAllocate is pretty close to No.1, but it's easier to write the code.
  2. Concat has really bad performance both for speed and memory usage. Don't use it.
  3. Buffer#Write and Buffer#WriteString are basically the same in speed, contrary to what @Dani-Br said in the comment. Considering string is indeed []byte in Go, it makes sense.
  4. bytes.Buffer basically use the same solution as Copy with extra book keeping and other stuff.
  5. Copy and Append use a bootstrap size of 64, the same as bytes.Buffer
  6. Append use more memory and allocs, I think it's related to the grow algorithm it use. It's not growing memory as fast as bytes.Buffer

Suggestion:

  1. For simple task such as what OP wants, I would use Append or AppendPreAllocate. It's fast enough and easy to use.
  2. If need to read and write the buffer at the same time, use bytes.Buffer of course. That's what it's designed for.
查看更多
Bombasti
7楼-- · 2019-01-05 07:06

You could create a big slice of bytes and copy the bytes of the short strings into it using string slices. There is a function given in "Effective Go":

func Append(slice, data[]byte) []byte {
    l := len(slice);
    if l + len(data) > cap(slice) { // reallocate
        // Allocate double what's needed, for future growth.
        newSlice := make([]byte, (l+len(data))*2);
        // Copy data (could use bytes.Copy()).
        for i, c := range slice {
            newSlice[i] = c
        }
        slice = newSlice;
    }
    slice = slice[0:l+len(data)];
    for i, c := range data {
        slice[l+i] = c
    }
    return slice;
}

Then when the operations are finished, use string ( ) on the big slice of bytes to convert it into a string again.

查看更多
登录 后发表回答