In Java, what would the fastest way to iterate over all the chars in a String, this:
String str = "a really, really long string";
for (int i = 0, n = str.length(); i < n; i++) {
char c = str.charAt(i);
}
Or this:
char[] chars = str.toCharArray();
for (int i = 0, n = chars.length; i < n; i++) {
char c = chars[i];
}
EDIT :
What I'd like to know is if the cost of repeatedly calling the charAt
method during a long iteration ends up being either less than or greater than the cost of performing a single call to toCharArray
at the beginning and then directly accessing the array during the iteration.
It'd be great if someone could provide a robust benchmark for different string lengths, having in mind JIT warm-up time, JVM start-up time, etc. and not just the difference between two calls to System.currentTimeMillis()
.
The first one using
str.charAt
should be faster.If you dig inside the source code of
String
class, we can see thatcharAt
is implemented as follows:Here, all it does is index an array and return the value.
Now, if we see the implementation of
toCharArray
, we will find the below:As you see, it is doing a
System.arraycopy
which is definitely going to be a tad slower than not doing it.FIRST UPDATE: Before you try this ever in a production environment (not advised), read this first: http://www.javaspecialists.eu/archive/Issue237.html Starting from Java 9, the solution as described won't work anymore, because now Java will store strings as byte[] by default.
SECOND UPDATE: As of 2016-10-25, on my AMDx64 8core and source 1.8, there is no difference between using 'charAt' and field access. It appears that the jvm is sufficiently optimized to inline and streamline any 'string.charAt(n)' calls.
It all depends on the length of the
String
being inspected. If, as the question says, it is for long strings, the fastest way to inspect the string is to use reflection to access the backingchar[]
of the string.A fully randomized benchmark with JDK 8 (win32 and win64) on an 64 AMD Phenom II 4 core 955 @ 3.2 GHZ (in both client mode and server mode) with 9 different techniques (see below!) shows that using
String.charAt(n)
is the fastest for small strings and that usingreflection
to access the String backing array is almost twice as fast for large strings.THE EXPERIMENT
9 different optimization techniques are tried.
All string contents are randomized
The test are done for string sizes in multiples of two starting with 0,1,2,4,8,16 etc.
The tests are done 1,000 times per string size
The tests are shuffled into random order each time. In other words, the tests are done in random order every time they are done, over 1000 times over.
The entire test suite is done forwards, and backwards, to show the effect of JVM warmup on optimization and times.
The entire suite is done twice, once in
-client
mode and the other in-server
mode.CONCLUSIONS
-client mode (32 bit)
For strings 1 to 256 characters in length, calling
string.charAt(i)
wins with an average processing of 13.4 million to 588 million characters per second.Also, it is overall 5.5% faster (client) and 13.9% (server) like this:
than like this with a local final length variable:
For long strings, 512 to 256K characters length, using reflection to access the String's backing array is fastest. This technique is almost twice as fast as String.charAt(i) (178% faster). The average speed over this range was 1.111 billion characters per second.
The Field must be obtained ahead of time and then it can be re-used in the library on different strings. Interestingly, unlike the code above, with Field access, it is 9% faster to have a local final length variable than to use 'chars.length' in the loop check. Here is how Field access can be setup as fastest:
Special comments on -server mode
Field access starting winning after 32 character length strings in server mode on a 64 bit Java machine on my AMD 64 machine. That was not seen until 512 characters length in client mode.
Also worth noting I think, when I was running JDK 8 (32 bit build) in server mode, the overall performance was 7% slower for both large and small strings. This was with build 121 Dec 2013 of JDK 8 early release. So, for now, it seems that 32 bit server mode is slower than 32 bit client mode.
That being said ... it seems the only server mode that is worth invoking is on a 64 bit machine. Otherwise it actually hampers performance.
For 32 bit build running in
-server mode
on an AMD64, I can say this:Also worth saying, String.chars() (Stream and the parallel version) are a bust. Way slower than any other way. The
Streams
API is a rather slow way to perform general string operations.Wish List
Java String could have predicate accepting optimized methods such as contains(predicate), forEach(consumer), forEachWithIndex(consumer). Thus, without the need for the user to know the length or repeat calls to String methods, these could help parsing libraries
beep-beep beep
speedup.Keep dreaming :)
Happy Strings!
~SH
The test used the following 9 methods of testing the string for the presence of whitespace:
"charAt1" -- CHECK THE STRING CONTENTS THE USUAL WAY:
"charAt2" -- SAME AS ABOVE BUT USE String.length() INSTEAD OF MAKING A FINAL LOCAL int FOR THE LENGTh
"stream" -- USE THE NEW JAVA-8 String's IntStream AND PASS IT A PREDICATE TO DO THE CHECKING
"streamPara" -- SAME AS ABOVE, BUT OH-LA-LA - GO PARALLEL!!!
"reuse" -- REFILL A REUSABLE char[] WITH THE STRINGS CONTENTS
"new1" -- OBTAIN A NEW COPY OF THE char[] FROM THE STRING
"new2" -- SAME AS ABOVE, BUT USE "FOR-EACH"
"field1" -- FANCY!! OBTAIN FIELD FOR ACCESS TO THE STRING'S INTERNAL char[]
"field2" -- SAME AS ABOVE, BUT USE "FOR-EACH"
COMPOSITE RESULTS FOR CLIENT
-client
MODE (forwards and backwards tests combined)Note: that the -client mode with Java 32 bit and -server mode with Java 64 bit are the same as below on my AMD64 machine.
COMPOSITE RESULTS FOR SERVER
-server
MODE (forwards and backwards tests combined)Note: this is the test for Java 32 bit running in server mode on an AMD64. The server mode for Java 64 bit was the same as Java 32 bit in client mode except that Field access starting winning after 32 characters size.
FULL RUNNABLE PROGRAM CODE
(to test on Java 7 and earlier, remove the two streams tests)