可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
It's a shame for me, but I did not know that:
You should use clone to copy arrays, because that's generally the
fastest way to do it.
as Josh Bloch states in this blog: http://www.artima.com/intv/bloch13.html
I always used System.arraycopy(...)
.
Both approaches are native, so probably without going deeper into the sources of libraries I can not figure out, why it is so.
My question is simple:
why is it the fastest way?
What is the difference with System.arraycopy
?
The difference is explained here, but it does not answer the question why Josh Bloch considers clone()
as the fastest way.
回答1:
I would like to make some points about why clone()
is the fastest way to copy an array than System.arraycopy(..)
or others:
1. clone()
doesn't have to do the typechecking before copying a source array to the destination one as provided here. It just simple allocates new memory space and assigns the objects to it. On the other hand, System.arraycopy(..)
checks for the type and then copies an array.
2. clone()
also breaks the optimization to eliminate redundant zeroing. As you know, every allocated array in Java must be initialized with 0s
or respective default values. However, JIT can avoid zeroing this array if it sees that the array is filled right after creation. That makes it definitely faster compared to changing the copy values with existing 0s
or respective default values. While using System.arraycopy(..)
spends significant amount of time clearing and copying the initialized array. To do so I have performed some of the benchmark tests.
@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, batchSize = 1000)
@Measurement(iterations = 10, time = 1, batchSize = 1000)
public class BenchmarkTests {
@Param({"1000","100","10","5", "1"})
private int size;
private int[] original;
@Setup
public void setup() {
original = new int[size];
for (int i = 0; i < size; i++) {
original[i] = i;
}
}
@Benchmark
public int[] SystemArrayCopy() {
final int length = size;
int[] destination = new int[length];
System.arraycopy(original, 0, destination, 0, length);
return destination;
}
@Benchmark
public int[] arrayClone() {
return original.clone();
}
}
Output:
Benchmark (size) Mode Cnt Score Error Units
ArrayCopy.SystemArrayCopy 1 thrpt 10 26324.251 ± 1532.265 ops/s
ArrayCopy.SystemArrayCopy 5 thrpt 10 26435.562 ± 2537.114 ops/s
ArrayCopy.SystemArrayCopy 10 thrpt 10 27262.200 ± 2145.334 ops/s
ArrayCopy.SystemArrayCopy 100 thrpt 10 10524.117 ± 474.325 ops/s
ArrayCopy.SystemArrayCopy 1000 thrpt 10 984.213 ± 121.934 ops/s
ArrayCopy.arrayClone 1 thrpt 10 55832.672 ± 4521.112 ops/s
ArrayCopy.arrayClone 5 thrpt 10 48174.496 ± 2728.928 ops/s
ArrayCopy.arrayClone 10 thrpt 10 46267.482 ± 4641.747 ops/s
ArrayCopy.arrayClone 100 thrpt 10 19837.480 ± 364.156 ops/s
ArrayCopy.arrayClone 1000 thrpt 10 1841.145 ± 110.322 ops/s
According to the outputs I am getting that clone
is almost twice fast from System.arraycopy(..)
3. Also, using manual copying method like clone()
results into faster ouput because it doesn't have to make any VM calls (unlike System.arraycopy()
).
回答2:
For one thing, clone()
doesn't have to do the typechecking that System.arraycopy()
does.
回答3:
I want to correct and complement previous answers.
- Object.clone uses unchecked System.arraycopy implementation for arrays;
- The main performance improvement of Object.clone, it is initialization of RAW memory directly. In the case of System.arraycopy it also tries to combine array initialization with copy operation, as we can see in source code, but it also does different additional checks for this, unlike Object.clone. If you just disable this feature (see below), then performance would be very closer (in particularly on my hardware).
- One more interesting thing is about Young vs Old Gen. In case when source array aligned and inside Old Gen, both methods have close performance.
- When we copy primitive arrays System.arraycopy always uses generate_unchecked_arraycopy.
- It depends from hardware/OS dependent implementations, so don't trust benchmarks and assumptions, check on you own.
Explanation
First of all clone method and System.arraycopy are intrinsics.
Object.clone and System.arraycopy use generate_unchecked_arraycopy.
And if we go deeper we could see that after that HotSpot select concrete implementation, dependent from OS, etc.
Longly.
Let's see the code from Hotspot.
First of all we will see that Object.clone (LibraryCallKit::inline_native_clone) uses generate_arraycopy, which used for System.arraycopy in case of -XX:-ReduceInitialCardMarks. Otherwise it does LibraryCallKit::copy_to_clone, which initialize new array in RAW memory (if -XX:+ReduceBulkZeroing, which enabled by default).
In contrast System.arraycopy uses generate_arraycopy directly, try to check ReduceBulkZeroing (and many others cases) and eliminate array zeroing too, with mentioned additional checks and also it would make additional checks to make sure that all elements are initialized, unlike Object.clone. Finally, in best case both of them use generate_unchecked_arraycopy.
Below I show some benchmarks to see this effect on practice:
- First one is just simple benchmark, the only difference from previous answer, that arrays is not sorted; We see that arraycopy is slower (but not two times), results - https://pastebin.com/ny56Ag1z;
- Secondly, I add option -XX:-ReduceBulkZeroing and now I see that the performance of both methods is very closer. Results - https://pastebin.com/ZDAeQWwx;
- I also assume that we will have the difference between Old/Young, because of arrays alignment (it is a feature of Java GC, when we call GC, alignment of arrays is changed, it is easy to observe using JOL). I was surprised that performance become the same, generally, and downgrade for both methods. Results - https://pastebin.com/bTt5SJ8r. For whom who believes in concrete numbers, throughput of System.arraycopy is better then Object.clone.
First benchmark:
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopy {
@Param({"10", "1000", "100000"})
int size;
int[] source;
@Setup(Level.Invocation)
public void setup() {
source = create(size);
}
@Benchmark
public int[] clone(CloneVsArraycopy cloneVsArraycopy) {
return cloneVsArraycopy.source.clone();
}
@Benchmark
public int[] arraycopy(CloneVsArraycopy cloneVsArraycopy) {
int[] dest = new int[cloneVsArraycopy.size];
System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
return dest;
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder()
.include(CloneVsArraycopy.class.getSimpleName())
.warmupIterations(20)
.measurementIterations(20)
.forks(20)
.build()).run();
}
private static int[] create(int size) {
int[] a = new int[size];
for (int i = 0; i < a.length; i++) {
a[i] = ThreadLocalRandom.current().nextInt();
}
return a;
}
}
Running this test on my PC, I got this - https://pastebin.com/ny56Ag1z.
The difference is not so big, but still exists.
The second benchmark I only add one setting -XX:-ReduceBulkZeroing and got this results https://pastebin.com/ZDAeQWwx. No we see that for Young Gen the difference is much less too.
In third benchmark I changed only setup method and enable ReduceBulkZeroing option back:
@Setup(Level.Invocation)
public void setup() {
source = create(size);
// try to move to old gen/align array
for (int i = 0; i < 10; ++i) {
System.gc();
}
}
The difference is much less (maybe in error interval) - https://pastebin.com/bTt5SJ8r.
Disclaimer
It is also could be wrong. You should check on your own.
In addition
I think, it is interesting to look on benchmarks process:
# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.arraycopy
# Parameters: (size = 50000)
# Run progress: 0,00% complete, ETA 00:07:30
# Fork: 1 of 5
# Warmup Iteration 1: 8,870 ops/ms
# Warmup Iteration 2: 10,912 ops/ms
# Warmup Iteration 3: 16,417 ops/ms <- Hooray!
# Warmup Iteration 4: 17,924 ops/ms <- Hooray!
# Warmup Iteration 5: 17,321 ops/ms <- Hooray!
# Warmup Iteration 6: 16,628 ops/ms <- What!
# Warmup Iteration 7: 14,286 ops/ms <- No, stop, why!
# Warmup Iteration 8: 13,928 ops/ms <- Are you kidding me?
# Warmup Iteration 9: 13,337 ops/ms <- pff
# Warmup Iteration 10: 13,499 ops/ms
Iteration 1: 13,873 ops/ms
Iteration 2: 16,177 ops/ms
Iteration 3: 14,265 ops/ms
Iteration 4: 13,338 ops/ms
Iteration 5: 15,496 ops/ms
For Object.clone
# Benchmark: org.egorlitvinenko.arrays.CloneVsArraycopy.clone
# Parameters: (size = 50000)
# Run progress: 0,00% complete, ETA 00:03:45
# Fork: 1 of 5
# Warmup Iteration 1: 8,761 ops/ms
# Warmup Iteration 2: 12,673 ops/ms
# Warmup Iteration 3: 20,008 ops/ms
# Warmup Iteration 4: 20,340 ops/ms
# Warmup Iteration 5: 20,112 ops/ms
# Warmup Iteration 6: 20,061 ops/ms
# Warmup Iteration 7: 19,492 ops/ms
# Warmup Iteration 8: 18,862 ops/ms
# Warmup Iteration 9: 19,562 ops/ms
# Warmup Iteration 10: 18,786 ops/ms
We can observe perfomance downgrade here for System.arraycopy. I saw similar picture for Streams and there was a bug in compilers.
I suppose it could be a bug in compilers too. Anyway, it is strange that after 3 warmup performance downgrades.
UPDATE
What is about typechecking
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
@State(Scope.Benchmark)
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class CloneVsArraycopyObject {
@Param({"100"})
int size;
AtomicLong[] source;
@Setup(Level.Invocation)
public void setup() {
source = create(size);
}
@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public AtomicLong[] clone(CloneVsArraycopyObject cloneVsArraycopy) {
return cloneVsArraycopy.source.clone();
}
@Benchmark
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public AtomicLong[] arraycopy(CloneVsArraycopyObject cloneVsArraycopy) {
AtomicLong[] dest = new AtomicLong[cloneVsArraycopy.size];
System.arraycopy(cloneVsArraycopy.source, 0, dest, 0, dest.length);
return dest;
}
public static void main(String[] args) throws Exception {
new Runner(new OptionsBuilder()
.include(CloneVsArraycopyObject.class.getSimpleName())
.jvmArgs("-XX:+UnlockDiagnosticVMOptions", "-XX:+PrintInlining", "-XX:-ReduceBulkZeroing")
.warmupIterations(10)
.measurementIterations(5)
.forks(5)
.build())
.run();
}
private static AtomicLong[] create(int size) {
AtomicLong[] a = new AtomicLong[size];
for (int i = 0; i < a.length; i++) {
a[i] = new AtomicLong(ThreadLocalRandom.current().nextLong());
}
return a;
}
}
Difference is not observed - https://pastebin.com/ufxCZVaC.
I suppose an explanation is simple, as System.arraycopy is hot intrinsic in that case, the real implementation would be just inlined without any typecheking, etc.
Note
I agreed with Radiodef you could find interesting to read blog post, the author of this blog is the creator (or one of creators) of JMH.
回答4:
As far as copy is concerned System.arrayCopy
is fastest, then and now.
System.arrayCopy
doesn't create a new array and can't be beaten in raw copy speed.
Arrays.copyOf
simply creates an array and calls arrayCopy
. Convenience.
Array.clone
is highly efficient, but it need to flush copied data to all cpu cache.
If you can code in a way to reuse array with arrayCopy
, go for it.
Otherwise I personally recommend copyOf
given the upward trend of cpu cores and because clone is considered old and problematic in general - the main point of the Josh Bloch blog that started this question.
Contrary to common thoughts, the actual copy loops (type checked or not) are not Java bytecode and is not optimizable by hotspot.
The loops are coded in C++ and is low level jvm implementation.
Long answer:
This answer is based on and link to source code of OpenJDK 8, which as far as I know should be the same for Sun.
Array copy is perhaps more complicated than what most people think. On C code level, it can be split into three cases:
- Primitive arrays are copied directly with a copy loop.
- Object arrays of same class, or subclass to super class array, are also copied directly.
- Otherwise, between arrays of different classes, a type check is made on each element.
The absolute speed of copying an array will thus vary greatly depends on the array type.
The relative speed of the three clone methods does not, however, since they all resolve to the same copy loop, an inlined C++ or assembly loop.
Thus the different in speed is mainly caused by overheads and other factors.
System.arrayCopy
is essentially type checks and length checks, then straight to the copy loop.
In my own tests arrayCopy
is always faster than the other two methods well beyond any margin of errors.
Arrays.copyOf
simply calls System.arrayCopy
- after creating a new array.
Note that it does not call Array.clone.
Contrary to Radiodef's comment, there is no indication that Java 8 will bypass the zero initialisation.
Array.clone
is interesting.
It directly calls heap allocation and copy loop, with minimal checks.
Thus its array creation should be quicker than Arrays.copyOf
, and its copy as fast as System.arrayCopy
if not faster.
But in my tests Array.clone
is slightly slower than copyOf
.
I suspect that it's because of the memory barrier after the copy.
Like a constructor, clone
will make sure the copied data are visible to all threads - which neither System.arrayCopy
nor Array.copyOf
do.
This means Array.clone
need to spend time on waiting CPU cache to update.
If this is true, the result of Array.clone
vs Arrays.copyOf
depends on whether the cache flush of clone
is faster than overheads of copyOf
, and should be platform dependent.
Other than this, since cloning always result in an array of same type, all three methods ultimately use the same copy loop.
If you only want to copy, arrayCopy
is always fastest, simply because it doesn't create a new array.
For the rest, if the java mailing list is anything to go by, the choice between Arrays.copyOf
and Array.clone
should be largely a matter of taste.
My jmh test result and code below.
- One way tests return copied array.
- Two way tests overwrite copy source which force next clone to copy "new" data.
NoClone
does not clone anything and is a yardstick to make sure higher is faster.
As stated, Clone and CopyOf is a close race and your mileage may vary.
/* # Run complete. Total time: 00:06:44
Benchmark Mode Cnt Score Error Units
MyBenchmark.ArrayCloneByteOneWay thrpt 20 1048588.503 ± 2608.862 ops/s
MyBenchmark.ArrayCloneByteTwoWay thrpt 20 523782.848 ± 1613.823 ops/s
MyBenchmark.ArrayCloneObjOneWay thrpt 20 260903.006 ± 1311.827 ops/s
MyBenchmark.ArrayCloneObjTwoWay thrpt 20 129448.639 ± 1179.122 ops/s
MyBenchmark.ArraysCopyOfByteOneWay thrpt 20 1065995.804 ± 2197.919 ops/s
MyBenchmark.ArraysCopyOfByteTwoWay thrpt 20 533025.610 ± 2831.955 ops/s
MyBenchmark.ArraysCopyOfObjOneWay thrpt 20 266134.565 ± 1536.756 ops/s
MyBenchmark.ArraysCopyOfObjTwoWay thrpt 20 130821.380 ± 274.325 ops/s
MyBenchmark.NoClone thrpt 20 308776528.157 ± 2546848.128 ops/s
MyBenchmark.SystemArrayCopyByteOneWay thrpt 20 1232733.367 ± 8439.409 ops/s
MyBenchmark.SystemArrayCopyByteTwoWay thrpt 20 859387.983 ± 1919.359 ops/s
MyBenchmark.SystemArrayCopyObjOneWay thrpt 20 239532.442 ± 775.193 ops/s
MyBenchmark.SystemArrayCopyObjTwoWay thrpt 20 167235.661 ± 503.141 ops/s
*/
import java.util.Arrays;
import java.util.Random;
import org.openjdk.jmh.annotations.*;
@Fork(2) @Warmup(iterations = 5, time = 1) @Measurement(iterations = 10, time = 1)
public class Q46230557 {
private static final int ARRAY_SIZE = 8192;
@State(Scope.Thread) public static class Data {
public byte[] bytes = new byte[ ARRAY_SIZE ];
public Object[] objs = new Object[ ARRAY_SIZE ];
@Setup public void setup() {
final Random RNG = new Random();
RNG.nextBytes( bytes );
for ( int i = 0 ; i < ARRAY_SIZE ; i++ )
objs[i] = RNG.nextInt();
}
}
@Benchmark public byte[] NoClone( final Data data ) {
return data.bytes;
}
@Benchmark public byte[] SystemArrayCopyByteOneWay( final Data data ) {
final byte[] dest = new byte[ ARRAY_SIZE ];
System.arraycopy( data.bytes, 0, dest, 0, ARRAY_SIZE );
return dest;
}
@Benchmark public byte[] SystemArrayCopyByteTwoWay( final Data data ) {
final byte[] buf = new byte[ ARRAY_SIZE ];
System.arraycopy( data.bytes, 0, buf, 0, ARRAY_SIZE );
System.arraycopy( buf, 0, data.bytes, 0, ARRAY_SIZE );
return data.bytes;
}
@Benchmark public byte[] ArraysCopyOfByteOneWay( final Data data ) {
return Arrays.copyOf( data.bytes, ARRAY_SIZE );
}
@Benchmark public byte[] ArraysCopyOfByteTwoWay( final Data data ) {
final byte[] buf = Arrays.copyOf( data.bytes, ARRAY_SIZE );
return data.bytes = Arrays.copyOf( buf, ARRAY_SIZE );
}
@Benchmark public byte[] ArrayCloneByteOneWay( final Data data ) {
return data.bytes.clone();
}
@Benchmark public byte[] ArrayCloneByteTwoWay( final Data data ) {
final byte[] buf = data.bytes.clone();
return data.bytes = buf.clone();
}
@Benchmark public Object[] SystemArrayCopyObjOneWay( final Data data ) {
final Object[] dest = new Object[ ARRAY_SIZE ];
System.arraycopy( data.objs, 0, dest, 0, ARRAY_SIZE );
return dest;
}
@Benchmark public Object[] SystemArrayCopyObjTwoWay( final Data data ) {
final Object[] buf = new Object[ ARRAY_SIZE ];
System.arraycopy( data.objs, 0, buf, 0, ARRAY_SIZE );
System.arraycopy( buf, 0, data.objs, 0, ARRAY_SIZE );
return data.objs;
}
@Benchmark public Object[] ArraysCopyOfObjOneWay( final Data data ) {
return Arrays.copyOf( data.objs, ARRAY_SIZE );
}
@Benchmark public Object[] ArraysCopyOfObjTwoWay( final Data data ) {
final Object[] buf = Arrays.copyOf( data.objs, ARRAY_SIZE );
return data.objs = Arrays.copyOf( buf, ARRAY_SIZE );
}
@Benchmark public Object[] ArrayCloneObjOneWay( final Data data ) {
return data.objs.clone();
}
@Benchmark public Object[] ArrayCloneObjTwoWay( final Data data ) {
final Object[] buf = data.objs.clone();
return data.objs = buf.clone();
}
}
回答5:
The difference in performance comes from skipping the step where the array is zeroed out.
public static int[] copyUsingArraycopy(int[] original)
{
// Memory is allocated and zeroed out
int[] copy = new int[original.Length];
// Memory is copied
System.arraycopy(original, 0, copy, 0, original.length);
}
public static int[] copyUsingClone(int[] original)
{
// Memory is allocated, but not zeroed out
// Unitialized memory is then copied into
return (int[])original.clone();
}
However, in cases where the performance of copying an array makes a significant difference it is generally better to employ double buffering.
int[] backBuffer = new int[BUFFER_SIZE];
int[] frontBuffer = new int[BUFFER_SIZE];
...
// Swap buffers
int[] temp = frontBuffer;
frontBuffer = backBuffer;
backBuffer = temp;
System.arraycopy(frontBuffer, 0, backBuffer, 0, BUFFER_SIZE);