I'm trying to find out the most efficient way to create a long byte string (or bytearray) by concatenating multiple shorter strings when the length of the whole string is known beforehand. I made this script and came up with these results:
import time
MSG = b'test message'
COUNT = 30000
def bytes_list_test():
tStart = time.clock()
l = []
for i in range(COUNT):
l.append(MSG)
bs = b''.join(l)
print('byte list time:', time.clock() - tStart)
def bytearray_test():
tStart = time.clock()
ba = bytearray()
for i in range(COUNT):
for c in MSG:
ba.append(c)
print('array time:', time.clock() - tStart)
def initialized_bytearray_test():
tStart = time.clock()
ba = bytearray([0x00]*len(MSG)*COUNT)
for i in range(COUNT):
ba[i*len(MSG):i*len(MSG)+len(MSG)] = MSG
print('initialized array time:', time.clock() - tStart)
bytes_list_test()
bytearray_test()
initialized_bytearray_test()
Results:
byte list time: 0.0076534920117410365
array time: 0.08107178658246994
initialized array time: 0.08843219671325642
A few questions:
1) Is creating a list of bytes and using the join() method the way to go as implied by the results?
2) Why is using a list of bytes so much faster than using a bytearray which seems like it's designed for this type of thing?
3) You would think the initialized array would be faster than the uninitialized array because the initialized array doesn't have to be resized (Note that it did occasionally perform better, but not by much and inconsistently so). Is it not faster because of the slicing operation?
The first function creates a list of pointers to the same object (NOT a list of bytes), then join
would do one memory allocation and COUNT
calls to memcpy
.
You can make the first function times faster (5x in my test) by dropping temporary list and using itertools.repeat
:
def bytes_list_test_opt():
tStart = time.clock()
bs = b''.join(itertools.repeat(MSG, COUNT))
print('byte list opt time:', time.clock() - tStart)
or, in this particular case, simply use *
operator of bytes
objects, which does exactly that:
bs = MSG*COUNT
The second function repeatedly iterates over MSG
, stores data byte-by-byte and has to repeatedly reallocate memory as the bytearray grows.
You can make the second function almost as fast as the original (unoptimized) first one by replacing iteration with single call to extend
:
def bytearray_test_opt():
tStart = time.clock()
ba = bytearray()
for i in range(COUNT):
ba.extend(MSG)
print('array opt time:', time.clock() - tStart)
After this modification, the second function will be slower than the first one only because of additional reallocations (~15% in my test).
The third function uses bytearray
's slice assignment, which accepts iterable and seems to be doing the same byte-by-byte iteration without recognizing that they could just memcpy
bytes into the place. This looks like a defect in the standard library that can be fixed.
As you see from the previous optimization, allocations take very small amount time compared to byte-by-byte copying, so preallocating has no visible impact here. You could save some time on doing fewer calculations, but it wont help much either:
def initialized_bytearray_test_opt():
tStart = time.clock()
L = len(MSG)
ba = bytearray(L*COUNT)
ofs = 0
for i in range(COUNT):
ba[ofs : ofs+L] = MSG
ofs += L
print('initialized array opt time:', time.clock() - tStart)
The final timings from my machine:
byte list time: 0.004823000000000001
byte list opt time: 0.0008649999999999977
array time: 0.043324
array opt time: 0.005505999999999997
initialized array time: 0.05936899999999999
initialized array opt time: 0.040164000000000005
P.S. Use timeit
module to perform measures like this, it provides much higher accuracy.