I'm pasting this text from an ebook I have. It says the complexity if O(n2) and also gives an explanation for it, but I fail to see how.
Question: What is the running time of this code?
public String makeSentence(String[] words) {
StringBuffer sentence = new StringBuffer();
for (String w : words) sentence.append(w);
return sentence.toString();
}
The answer the book gave:
O(n2), where n is the number of letters in sentence. Here’s why: each time you append a string to sentence, you create a copy of sentence and run through all the letters in sentence to copy them over If you have to iterate through up to n characters each time in the loop, and you’re looping at least n times, that gives you an O(n2) run time. Ouch!
Can someone please explain this answer more clearly?
As the explanation given in the book, for ever word in the string array a new object of sentence gets created and that sentence object first copies the previous sentence and then traverses till the end of the array and then appends the new word, hence the complexity of
n^2
.Hence
n*n
will ben^2
.I tried to check it using this program
And result was, as expected, O(n):
java Test 200000 - 128 ms
java Test 500000 - 370 ms
java Test 1000000 - 698 ms
Version 1.6.0.21
The accepted answer is just wrong.
StringBuffer
has amortized O(1) append, so n appends will be O(n).If it wasn't O(1) append,
StringBuffer
would have no reason to exist, since writing that loop with plainString
concatenation would be O(n^2) as well!There's a typo in this book.
1st case :
Complexity : O(n^2) -> (n words) x (n characters copied at each iteration, for copying the current sentence into a StringBuffer)
2nd case :
Complexity : O(n) -> (n words) x O(1) (amortized complexity for StringBuffer concatenation)
It's a bit difficult to answer a question about the complexity of this code when it is written at a high level which abstracts away the details of the implementation. The Java documentation doesn't seem to give any guarantees in terms of the complexity of the
append
function. As others have pointed out, theStringBuffer
class can (and should) be written so that the complexity of appending strings does not depend on the current length of the string held inStringBuffer
.However, I suspect it is not that helpful to the person asking this question to simply say "your book is wrong!" - instead, let us see what assumptions are being made and make clear what the author was trying to say.
You can make the following assumptions:
new StringBuffer
is O(1)w
inwords
is O(1)sentence.toString
is at most O(n).The question is really what is the order of
sentence.append(w)
, and that depends on how it happens inside theStringBuffer
. The naive way is to do it like Shlemiel the Painter.The silly way
Suppose you use a C-style null-terminated string for the contents of
StringBuffer
. The way you find the end of such a string is by reading each character, one by one, until you find the null character - then to append a new string S, you can start copying characters from S to theStringBuffer
string (finishing with another null character). If you writeappend
this way, it is O(a + b), where a is the number of characters currently in theStringBuffer
, and b is the number of characters in the new word. If you loop over an array of words, and each time you have to read all the characters you just appended before appending the new word, then the complexity of the loop is O(n^2), where n is the total number of characters in all the words (also the number of characters in the final sentence).A better way
On the other hand, suppose that the contents of
StringBuffer
is still an array of characters, but we also store an integersize
which tells us how long the string is (number of characters). Now we no longer have to read every character in theStringBuffer
in order to find the end of the string; we can just look up indexsize
in the array, which is O(1) instead of O(a). Then theappend
function now only depends on the number of characters being appended, O(b). In this case the complexity of the loop is O(n), where n is the total number of characters in all the words....We're not done yet!
Finally, there's one more aspect of the implementation that hasn't been covered yet, and that is the one actually brought up by the answer in the textbook - memory allocation. Each time you want to write more characters to your
StringBuffer
, you're not guaranteed to have enough space in your character array to actually fit the new word in. If there isn't enough space, your computer needs to first allocate some more room in a clean section of memory, and then copy all the information in the oldStringBuffer
array across, and then it can continue as before. Copying data like this will take O(a) time (where a is the number of characters to be copied).In the worst case, you have to allocate more memory every time you add a new word. This basically takes us back to square one where the loop has O(n^2) complexity, and is what the book seems to suggest. If you assume that nothing crazy is happening (the words aren't getting longer at an exponential rate!), then you can probably reduce the number of memory allocations to something more like O(log(n)) by having the allocated memory grow exponentially. If that's the number of memory allocations, and memory allocations in general are O(a), then the total complexity attributed just to memory management in the loop is O(n log(n)). Since the appending work is O(n) and less than the complexity of the memory management, the total complexity of the function is O(n log(n)).
Again, the Java documentation doesn't help us in terms of how the capacity of the
StringBuffer
grows, it just says "If the internal buffer overflows, it is automatically made larger". Depending on how it happens, you could end up with either O(n^2) or O(n log(n)) overall.As an exercise left to the reader: Find an easy way to modify the function so that the overall complexity is O(n), by removing memory reallocation issues.
Here's my calculation for how they got O(n^2)
We'll ignore the CPU time for declaring the StringBuffer, as it doesn't vary with the size of the final string.
When calculating the O complexity we are concerned with the worst case, this will occur when there are 1 letter Strings. I shall explain after this example:
Let's say we have 4 one-letter strings: 'A', 'B', 'C', 'D'.
Read in A: CPU-time to find end of StringBuffer: 0 CPU-time to append 'A': 1
Read in B: CPU-time to find end of StringBuffer: 1 CPU-time to append 'B': 1
Read in C: CPU-time to find end of StringBuffer: 2 CPU-time to append 'C': 1
Read in D: CPU-time to find end of StringBuffer: 3 CPU-time to append 'D': 1
CPU-time to copy StringBuffer to String at the end: 4
Total CPU-time = 1 + 2 + 3 + 4 + 4
If we generalise this to n 1-letter words:
1 + 2 + 3 + ...... + n + n = 0.5n(n+1) + n
I did this by using the formula for the sum of an arithmetic sequence.
O(0.5n^2 + 1.5n) = O(n^2).
If we use multi-letter words, we are going to have to find the end of the StringBuffer less frequently, leading to a lower CPU-time and a 'better' case.