A common antipattern in Python is to concatenate a sequence of strings using +
in a loop. This is bad because the Python interpreter has to create a new string object for each iteration, and it ends up taking quadratic time. (Recent versions of CPython can apparently optimize this in some cases, but other implementations can't, so programmers are discouraged from relying on this.) ''.join
is the right way to do this.
However, I've heard it said (including here on Stack Overflow) that you should never, ever use +
for string concatenation, but instead always use ''.join
or a format string. I don't understand why this is the case if you're only concatenating two strings. If my understanding is correct, it shouldn't take quadratic time, and I think a + b
is cleaner and more readable than either ''.join((a, b))
or '%s%s' % (a, b)
.
Is it good practice to use +
to concatenate two strings? Or is there a problem I'm not aware of?
There is nothing wrong in concatenating two strings with
+
. Indeed it's easier to read than''.join([a, b])
.You are right though that concatenating more than 2 strings with
+
is an O(n^2) operation (compared to O(n) forjoin
) and thus becomes inefficient. However this has not to do with using a loop. Evena + b + c + ...
is O(n^2), the reason being that each concatenation produces a new string.CPython2.4 and above try to mitigate that, but it's still advisable to use
join
when concatenating more than 2 strings.Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.
''.join([a, b, c])
trick is a performance optimization.When working with multiple people, it's sometimes difficult to know exactly what's happening. Using a format string instead of concatenation can avoid one particular annoyance that's happened a whole ton of times to us:
Say, a function requires an argument, and you write it expecting to get a string:
So, this function may be used pretty often throughout the code. Your coworkers may know exactly what it does, but not necessarily be fully up-to-speed on the internals, and may not know that the function expects a string. And so they may end up with this:
There would be no problem if you just used a format string:
The same is true for all types of objects that define
__str__
, which may be passed in as well:So yes: If you can use a format string do it and take advantage of what Python has to offer.
''.join([a, b]) is better solution than +.
Because Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such)
form a += b or a = a + b is fragile even in CPython and isn't present at all in implementations that don't use refcounting (reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource)
https://www.python.org/dev/peps/pep-0008/#programming-recommendations
According to Python docs, using str.join() will give you performance consistence across various implementations of Python. Although CPython optimizes away the quadratic behavior of s = s + t, other Python implementations may not.
Sequence Types in Python docs (see the foot note [6])
The assumption that one should never, ever use + for string concatenation, but instead always use ''.join may be a myth. It is true that using
+
creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that callingjoin
in a loop would generally add the overhead offunction call
. Lets take your example.Create two lists, one from the linked SO question and another a bigger fabricated
Lets create two functions,
UseJoin
andUsePlus
to use the respectivejoin
and+
functionality.Lets run timeit with the first list
They have almost the same runtime.
Lets use cProfile
And it looks that using Join, results in unnecessary function calls which could add to the overhead.
Now coming back to the question. Should one discourage the use of
+
overjoin
in all cases?I believe no, things should be taken into consideration
And off-course in a development pre-mature optimization is evil.