Fastest way to create new array with length N and

2020-06-03 02:08发布

I want to allocate a new array with the length N and fill it up by repeating a given array. The interface looks like this:

<T> T[] repeat(T[] array, int n);

To clarify what I mean here is a small example:

String a = {"a", "b", "c"};
//     b = {"a", "b", "c", "a", "b", "c", "a", "b", "c", "a"}
String b = repeat(a, 10); 

Most of the programmer will come up with the following solution (for simplicity of array generation a specific type was chosen):

public String[] repeat(String[] array, int n) {
  String[] repeated = new String[n];
  for (int i = 0; i < n; i++) {
    repeated[i] = array[i % array.length];
  }
  return repeated;
}

Is there a faster way to do this?

2条回答
走好不送
2楼-- · 2020-06-03 02:49

I came up with this generic solution:

public static <T> T[] repeat(T[] arr, int newLength) {
  T[] dup = Arrays.copyOf(arr, newLength);
  for (long last = arr.length; last != 0 && last < newLength; last <<= 1) {
    System.arraycopy(dup, 0, dup, (int) last, (int) (Math.min(last << 1, newLength) - last));
  }
  return dup;
}

Theory

System.arraycopy is a native call. Therefore it is very fast but it doesn't mean it is the fastest way.

Every other solution copys the array element by element. My solution copys larger blocks. Every iteration duplicates the existing elements in the array which means the loop will run at most log2(n) times.

Profiling reports

Input

TEST_ARRAY = { "a", "b", "c", "d", "e", "f" }
NEW_LENGTH = 10000

Here is my benchmark code to reproduce the results:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;

@Fork(3)
@BenchmarkMode(Mode.AverageTime)
@Measurement(iterations = 10, timeUnit = TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Threads(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.NANOSECONDS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class MyBenchmark {

  private static final String[] TEST_ARRAY = { "a", "b", "c", "d", "e", "f" };
  private static final int NEW_LENGTH = 10_000;

  @Benchmark
  public String[] testNativeCall() {
    String[] dup = Arrays.copyOf(TEST_ARRAY, NEW_LENGTH);
    for (int last = TEST_ARRAY.length; last != 0 && last < NEW_LENGTH; last <<= 1) {
      System.arraycopy(dup, 0, dup, last, Math.min(last << 1, NEW_LENGTH) - last);
    }
    return dup;
  }

  @Benchmark
  public String[] testLoopModulo() {
    String[] arr = new String[NEW_LENGTH];
    for (int i = 0; i < NEW_LENGTH; i++) {
      arr[i] = arr[i % TEST_ARRAY.length];
    }
    return arr;
  }

  @Benchmark
  public String[] testArrayList() {
    List<String> initialLetters = Arrays.asList(TEST_ARRAY);
    List<String> results = new ArrayList<>();
    int indexOfLetterToAdd = 0;
    for (int i = 0; i < 10000; i++) {
      results.add(initialLetters.get(indexOfLetterToAdd++));
      if (indexOfLetterToAdd == initialLetters.size()) {
        indexOfLetterToAdd = 0;
      }
    }
    return results.toArray(new String[results.size()]);
  }

  @Benchmark
  public String[] testLoopReset() {
    String result[] = new String[NEW_LENGTH];
    for (int i = 0, j = 0; i < NEW_LENGTH && j < TEST_ARRAY.length; i++, j++) {
      result[i] = TEST_ARRAY[j];
      if (j == TEST_ARRAY.length - 1) {
        j = -1;
      }
    }
    return result;
  }

  @Benchmark
  public String[] testStream() {
    String[] result = Stream.iterate(TEST_ARRAY, x -> x).flatMap(x -> Stream.of(TEST_ARRAY)).limit(NEW_LENGTH)
        .toArray(String[]::new);
    return result;
  }
}

Results

Benchmark                    Mode  Cnt      Score      Error  Units
MyBenchmark.testNativeCall   avgt   30   4154,553 ±   11,242  ns/op
MyBenchmark.testLoopModulo   avgt   30  19273,717 ±  235,547  ns/op
MyBenchmark.testArrayList    avgt   30  71079,139 ± 2686,136  ns/op
MyBenchmark.testLoopReset    avgt   30  18307,368 ±  202,520  ns/op
MyBenchmark.testStream       avgt   30  68898,278 ± 2488,104  ns/op

As you can see the native call method is the fastest way to repeat an array.

Additional Results

Further I was asked to benchmark these methods with various inputs.

Input ranges

// Array size not fixed anymore - filled with random elements
SIZE = { 100, 1000, 100000, 1000000 }
NEW_LENGTH = { 100, 1000, 100000, 1000000 }

That means there are SIZE x NEW_LENGTH tests and here are the results:

Benchmark                   (NEW_LENGTH)   (SIZE)  Mode  Cnt        Score        Error  Units
MyBenchmark.testArrayList            100      100  avgt   30      706,274 ±      6,787  ns/op
MyBenchmark.testArrayList            100     1000  avgt   30      692,586 ±     15,076  ns/op
MyBenchmark.testArrayList            100   100000  avgt   30      685,214 ±      6,747  ns/op
MyBenchmark.testArrayList            100  1000000  avgt   30      685,333 ±      5,493  ns/op
MyBenchmark.testArrayList           1000      100  avgt   30     7170,897 ±     63,221  ns/op
MyBenchmark.testArrayList           1000     1000  avgt   30     7180,612 ±     93,280  ns/op
MyBenchmark.testArrayList           1000   100000  avgt   30     6818,585 ±    197,859  ns/op
MyBenchmark.testArrayList           1000  1000000  avgt   30     6810,614 ±    139,456  ns/op
MyBenchmark.testArrayList         100000      100  avgt   30   597614,173 ±   6446,318  ns/op
MyBenchmark.testArrayList         100000     1000  avgt   30   580696,750 ±   5141,845  ns/op
MyBenchmark.testArrayList         100000   100000  avgt   30   598657,608 ±   5126,519  ns/op
MyBenchmark.testArrayList         100000  1000000  avgt   30   595529,027 ±   4981,095  ns/op
MyBenchmark.testArrayList        1000000      100  avgt   30  6836746,484 ±  38848,467  ns/op
MyBenchmark.testArrayList        1000000     1000  avgt   30  6745066,786 ±  57971,469  ns/op
MyBenchmark.testArrayList        1000000   100000  avgt   30  7130391,072 ±  50583,914  ns/op
MyBenchmark.testArrayList        1000000  1000000  avgt   30  8791342,042 ± 172323,938  ns/op
MyBenchmark.testLoopModulo           100      100  avgt   30      301,252 ±      1,195  ns/op
MyBenchmark.testLoopModulo           100     1000  avgt   30      301,988 ±      2,056  ns/op
MyBenchmark.testLoopModulo           100   100000  avgt   30      299,892 ±      1,776  ns/op
MyBenchmark.testLoopModulo           100  1000000  avgt   30      300,468 ±      2,569  ns/op
MyBenchmark.testLoopModulo          1000      100  avgt   30     3277,018 ±     14,880  ns/op
MyBenchmark.testLoopModulo          1000     1000  avgt   30     3275,648 ±     21,742  ns/op
MyBenchmark.testLoopModulo          1000   100000  avgt   30     3258,570 ±     27,360  ns/op
MyBenchmark.testLoopModulo          1000  1000000  avgt   30     3259,617 ±     28,747  ns/op
MyBenchmark.testLoopModulo        100000      100  avgt   30   321483,331 ±   4320,938  ns/op
MyBenchmark.testLoopModulo        100000     1000  avgt   30   326319,662 ±   2419,602  ns/op
MyBenchmark.testLoopModulo        100000   100000  avgt   30   327027,966 ±   3174,011  ns/op
MyBenchmark.testLoopModulo        100000  1000000  avgt   30   319201,057 ±   4472,220  ns/op
MyBenchmark.testLoopModulo       1000000      100  avgt   30  3053122,364 ±  31814,342  ns/op
MyBenchmark.testLoopModulo       1000000     1000  avgt   30  3134151,676 ± 108227,023  ns/op
MyBenchmark.testLoopModulo       1000000   100000  avgt   30  3220082,188 ±  43925,401  ns/op
MyBenchmark.testLoopModulo       1000000  1000000  avgt   30  3204777,236 ±  25365,542  ns/op
MyBenchmark.testLoopReset            100      100  avgt   30      159,828 ±      1,107  ns/op
MyBenchmark.testLoopReset            100     1000  avgt   30      125,461 ±      0,881  ns/op
MyBenchmark.testLoopReset            100   100000  avgt   30      129,912 ±      7,801  ns/op
MyBenchmark.testLoopReset            100  1000000  avgt   30      134,503 ±      7,602  ns/op
MyBenchmark.testLoopReset           1000      100  avgt   30     1809,207 ±     93,642  ns/op
MyBenchmark.testLoopReset           1000     1000  avgt   30     1728,705 ±     70,808  ns/op
MyBenchmark.testLoopReset           1000   100000  avgt   30     1354,887 ±      9,631  ns/op
MyBenchmark.testLoopReset           1000  1000000  avgt   30     1350,327 ±     15,886  ns/op
MyBenchmark.testLoopReset         100000      100  avgt   30   159680,209 ±   2477,183  ns/op
MyBenchmark.testLoopReset         100000     1000  avgt   30   162030,985 ±   1949,660  ns/op
MyBenchmark.testLoopReset         100000   100000  avgt   30   149299,890 ±   1516,486  ns/op
MyBenchmark.testLoopReset         100000  1000000  avgt   30   136059,242 ±   3090,410  ns/op
MyBenchmark.testLoopReset        1000000      100  avgt   30  1407369,992 ±  12979,717  ns/op
MyBenchmark.testLoopReset        1000000     1000  avgt   30  1447001,173 ±  14979,769  ns/op
MyBenchmark.testLoopReset        1000000   100000  avgt   30  1463913,706 ±  12564,617  ns/op
MyBenchmark.testLoopReset        1000000  1000000  avgt   30  1404701,860 ±  21587,436  ns/op
MyBenchmark.testNativeCall           100      100  avgt   30       58,306 ±      0,669  ns/op
MyBenchmark.testNativeCall           100     1000  avgt   30       57,441 ±      0,590  ns/op
MyBenchmark.testNativeCall           100   100000  avgt   30       57,595 ±      0,386  ns/op
MyBenchmark.testNativeCall           100  1000000  avgt   30       60,196 ±      1,995  ns/op
MyBenchmark.testNativeCall          1000      100  avgt   30      450,808 ±      8,259  ns/op
MyBenchmark.testNativeCall          1000     1000  avgt   30      558,079 ±      5,724  ns/op
MyBenchmark.testNativeCall          1000   100000  avgt   30      557,246 ±      4,873  ns/op
MyBenchmark.testNativeCall          1000  1000000  avgt   30      565,005 ±      9,696  ns/op
MyBenchmark.testNativeCall        100000      100  avgt   30    73074,811 ±   3332,432  ns/op
MyBenchmark.testNativeCall        100000     1000  avgt   30    70970,603 ±   2693,394  ns/op
MyBenchmark.testNativeCall        100000   100000  avgt   30    69907,864 ±   2945,072  ns/op
MyBenchmark.testNativeCall        100000  1000000  avgt   30    74041,205 ±   2599,841  ns/op
MyBenchmark.testNativeCall       1000000      100  avgt   30   790679,353 ±  15672,480  ns/op
MyBenchmark.testNativeCall       1000000     1000  avgt   30   812660,137 ±  25490,999  ns/op
MyBenchmark.testNativeCall       1000000   100000  avgt   30   838094,181 ±  12374,194  ns/op
MyBenchmark.testNativeCall       1000000  1000000  avgt   30   925567,535 ±  19091,943  ns/op
MyBenchmark.testStream               100      100  avgt   30      810,262 ±     54,519  ns/op
MyBenchmark.testStream               100     1000  avgt   30     1344,998 ±     14,792  ns/op
MyBenchmark.testStream               100   100000  avgt   30   159901,562 ±   3453,210  ns/op
MyBenchmark.testStream               100  1000000  avgt   30  1407506,571 ± 419985,287  ns/op
MyBenchmark.testStream              1000      100  avgt   30     6464,099 ±    169,665  ns/op
MyBenchmark.testStream              1000     1000  avgt   30     5869,457 ±    260,297  ns/op
MyBenchmark.testStream              1000   100000  avgt   30   165394,656 ±   4943,362  ns/op
MyBenchmark.testStream              1000  1000000  avgt   30  1352900,959 ± 412849,634  ns/op
MyBenchmark.testStream            100000      100  avgt   30   423531,274 ±   3944,801  ns/op
MyBenchmark.testStream            100000     1000  avgt   30   391727,181 ±   5341,826  ns/op
MyBenchmark.testStream            100000   100000  avgt   30   427462,700 ±   7517,953  ns/op
MyBenchmark.testStream            100000  1000000  avgt   30   981304,769 ±  10206,849  ns/op
MyBenchmark.testStream           1000000      100  avgt   30  4528465,859 ±  72959,405  ns/op
MyBenchmark.testStream           1000000     1000  avgt   30  4121720,516 ±  60283,781  ns/op
MyBenchmark.testStream           1000000   100000  avgt   30  5920334,609 ±  63051,631  ns/op
MyBenchmark.testStream           1000000  1000000  avgt   30  6227476,270 ±  84066,493  ns/op

As expected the native call is always ahead (about 2 times faster than the loop version).

查看更多
Summer. ? 凉城
3楼-- · 2020-06-03 03:02

You don't need to Min on each loop, you can extract that to afterwards. You also double the offset twice in each loop.

However, mine doesn't appear to measurably faster, sometimes faster, sometimes not, but it's an alternative to the Min call.

Benchmark                          Mode  Samples     Score  Score error  Units
c.e.MyBenchmark.testNativeCall     avgt       30  4027.890       26.544  ns/op
c.e.MyBenchmark.westonsRepeat      avgt       30  4035.817       49.168  ns/op

Code:

@Benchmark
public String[] westonsRepeat() {
    String[] sourceArray = TEST_ARRAY;
    int newLength = NEW_LENGTH;

    int offset = sourceArray.length;
    String[] dup = Arrays.copyOf(sourceArray, newLength);
    if (offset == 0 || offset >= newLength) return dup;

    int next = offset << 1;
    while (next < newLength) {
        System.arraycopy(dup, 0, dup, offset, offset);
        offset = next;
        next <<= 1;
    }
    System.arraycopy(dup, 0, dup, offset, newLength - offset);
    return dup;
}

The other thing I tried was calculating the number of times I could safely double and while it was some what a dead end performance wise I thought it interesting to show:

int timesCanDouble = (int)(log2(newLength) - log2(offset));
=>
int timesCanDouble = (int)(log2(newLength/offset));
=>
int timesCanDouble = 31 - numberOfLeadingZeros(newLength/offset));
查看更多
登录 后发表回答