Python equivalent of Java StringBuffer?

2019-01-13 17:31发布

问题:

Is there anything in Python like Java's StringBuffer? Since strings are immutable in Python too, editing them in loops would be inefficient.

回答1:

Efficient String Concatenation in Python is a rather old article and its main statement that the naive concatenation is far slower than joining is not valid anymore, because this part has been optimized in CPython since then:

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations. @ http://docs.python.org/2/library/stdtypes.html

I've adapted their code a bit and got the following results on my machine:

from cStringIO import StringIO
from UserString import MutableString
from array import array

import sys, timeit

def method1():
    out_str = ''
    for num in xrange(loop_count):
        out_str += `num`
    return out_str

def method2():
    out_str = MutableString()
    for num in xrange(loop_count):
        out_str += `num`
    return out_str

def method3():
    char_array = array('c')
    for num in xrange(loop_count):
        char_array.fromstring(`num`)
    return char_array.tostring()

def method4():
    str_list = []
    for num in xrange(loop_count):
        str_list.append(`num`)
    out_str = ''.join(str_list)
    return out_str

def method5():
    file_str = StringIO()
    for num in xrange(loop_count):
        file_str.write(`num`)
    out_str = file_str.getvalue()
    return out_str

def method6():
    out_str = ''.join([`num` for num in xrange(loop_count)])
    return out_str

def method7():
    out_str = ''.join(`num` for num in xrange(loop_count))
    return out_str


loop_count = 80000

print sys.version

print 'method1=', timeit.timeit(method1, number=10)
print 'method2=', timeit.timeit(method2, number=10)
print 'method3=', timeit.timeit(method3, number=10)
print 'method4=', timeit.timeit(method4, number=10)
print 'method5=', timeit.timeit(method5, number=10)
print 'method6=', timeit.timeit(method6, number=10)
print 'method7=', timeit.timeit(method7, number=10)

Results:

2.7.1 (r271:86832, Jul 31 2011, 19:30:53) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
method1= 0.171155929565
method2= 16.7158739567
method3= 0.420584917068
method4= 0.231794118881
method5= 0.323612928391
method6= 0.120429992676
method7= 0.145267963409

Conclusions:

  • join still wins over concat, but marginally
  • list comprehensions are faster than loops
  • joining generators is slower than joining lists
  • other methods are of no use (unless you're doing something special)


回答2:

Depends on what you want to do. If you want a mutable sequence, the builtin list type is your friend, and going from str to list and back is as simple as:

 mystring = "abcdef"
 mylist = list(mystring)
 mystring = "".join(mylist)

If you want to build a large string using a for loop, the pythonic way is usually to build a list of strings then join them together with the proper separator (linebreak or whatever).

Else you can also use some text template system, or a parser or whatever specialized tool is the most appropriate for the job.



回答3:

Perhaps use a bytearray:

In [1]: s = bytearray('Hello World')

In [2]: s[:5] = 'Bye'

In [3]: s
Out[3]: bytearray(b'Bye World')

In [4]: str(s)
Out[4]: 'Bye World'

The appeal of using a bytearray is its memory-efficiency and convenient syntax. It can also be faster than using a temporary list:

In [36]: %timeit s = list('Hello World'*1000); s[5500:6000] = 'Bye'; s = ''.join(s)
1000 loops, best of 3: 256 µs per loop

In [37]: %timeit s = bytearray('Hello World'*1000); s[5500:6000] = 'Bye'; str(s)
100000 loops, best of 3: 2.39 µs per loop

Note that much of the difference in speed is attributable to the creation of the container:

In [32]: %timeit s = list('Hello World'*1000)
10000 loops, best of 3: 115 µs per loop

In [33]: %timeit s = bytearray('Hello World'*1000)
1000000 loops, best of 3: 1.13 µs per loop


回答4:

The previously provided answers are almost always best. However, sometimes the string is built up across many method calls and/or loops, so it's not necessarily natural to build up a list of lines and then join them. And since there's no guarantee you are using CPython or that CPython's optimization will apply, then another approach is to just use print!

Here's an example helper class, although the helper class is trivial and probably unnecessary, it serves to illustrate the approach (Python 3):

import io

class StringBuilder(object):

  def __init__(self):
    self._stringio = io.StringIO()

  def __str__(self):
    return self._stringio.getvalue()

  def append(self, *objects, sep=' ', end=''):
    print(*objects, sep=sep, end=end, file=self._stringio)

sb = StringBuilder()
sb.append('a')
sb.append('b', end='\n')
sb.append('c', 'd', sep=',', end='\n')
print(sb)  # 'ab\nc,d\n'


回答5:

this link might be useful for concatenation in python

http://pythonadventures.wordpress.com/2010/09/27/stringbuilder/

example from above link:

def g():
    sb = []
    for i in range(30):
        sb.append("abcdefg"[i%7])

    return ''.join(sb)

print g()   

# abcdefgabcdefgabcdefgabcdefgab


回答6:

Just a test I run on python 3.6.2 showing that "join" still win BIG!

from time import time


def _with_format(i):
    _st = ''
    for i in range(0, i):
        _st = "{}{}".format(_st, "0")
    return _st


def _with_s(i):
    _st = ''
    for i in range(0, i):
        _st = "%s%s" % (_st, "0")
    return _st


def _with_list(i):
    l = []
    for i in range(0, i):
        l.append("0")
    return "".join(l)


def _count_time(name, i, func):
    start = time()
    r = func(i)
    total = time() - start
    print("%s done in %ss" % (name, total))
    return r

iterationCount = 1000000

r1 = _count_time("with format", iterationCount, _with_format)
r2 = _count_time("with s", iterationCount, _with_s)
r3 = _count_time("with list and join", iterationCount, _with_list)

if r1 != r2 or r2 != r3:
    print("Not all results are the same!")

And the output was:

with format done in 17.991968870162964s
with s done in 18.36879801750183s
with list and join done in 0.12142801284790039s