There are several questions already on Stack Overflow about allocating an array (say [i32]
) on the heap. The general recommendation is boxing, e.g. Box<[i32]>
. But while boxing works fine enough for smaller arrays, the problem is that the array being boxed has to first be allocated on the stack.
So if the array is too large (say 10 million elements), you will - even with boxing - get a stack overflow (one is unlikely to have a stack that large).
The suggestion then is using Vec<T>
instead, that is Vec<i32>
in our example. And while that does do the job, it does have a performance impact.
Consider the following program:
fn main() {
const LENGTH: usize = 10_000;
let mut a: [i32; LENGTH] = [0; LENGTH];
for j in 0..LENGTH {
for i in 0..LENGTH {
a[i] = j as i32;
}
}
}
time
tells me that this program takes about 2.9 seconds to run. I use 10'000 in this example, so I can allocate that on the stack, but I really want one with 10 million.
Now consider the same program but with Vec<T>
instead:
fn main() {
const LENGTH: usize = 10_000;
let mut a: Vec<i32> = vec![0; LENGTH];
for j in 0..LENGTH {
for i in 0..LENGTH {
a[i] = j as i32;
}
}
}
time
tells me that this program takes about 5 seconds to run. Now time
isn't super exact, but the difference of about 2 seconds for such a simple program is not an insignificant impact.
Storage is storage, the program with array is just as fast when the array is boxed. So it's not the heap slowing the Vec<T>
version down, but the Vec<T>
structure itself.
I also tried with a HashMap
(specifically HashMap<usize, i32>
to mimic an array structure), but that's far slower than the Vec<T>
solution.
If my LENGTH
had been 10 million, the first version wouldn't even have run.
If that's not possible, is there a structure that behaves like an array (and Vec<T>
) on the heap, but can match the speed and performance of an array?
If you really want a heap-allocated array, i.e.
Box<[i32; LENGTH]>
then you can use:It's not going to be faster than using a vector since Rust does bounds-checking even on arrays, but it has a smaller interface which might make sense in terms of software design.
Summary: your benchmark is flawed; just use a
Vec
(as described here), possibly withinto_boxed_slice
, as it is incredibly unlikely to be slower than a heap allocated array.Unfortunately, your benchmarks are flawed. First of all, you probably didn't compile with optimizations (
--release
for cargo,-O
for rustc). Because if you would have, the Rust compiler would have removed all of your code. See the assembly here. Why? Because you never observe the vector/array, so there is no need to do all that work in the first place.Also, your benchmark is not testing what you actually want to test. You are comparing an stack-allocated array with a heap-allocated vector. You should compare the
Vec
to a heap allocated array.Don't feel bad though: writing benchmarks is incredible hard for many reasons. Just remember: if you don't know a lot about writing benchmarks, better don't trust your own benchmarks without asking others first.
I fixed your benchmark and included all three possibilities:
Vec
, array on stack and array on heap. You can find the full code here. The results are:Surprise: the difference between the versions are way less than the variance of the measurement. So we can assume they all are pretty equally fast.
Note that this benchmark is still pretty bad. The two loops can just be replaced by one loop setting all array elements to
LENGTH - 1
. From taking a quick look at the assembly (and from the rather long time of 9ms), I think that LLVM is not smart enough to actually perform this optimization. But things like this are important and one should be aware of that.Finally, let's discuss why both solutions should be equally fast and whether there are actually differences in speed.
The data section of a
Vec<T>
has exactly the same memory layout as a[T]
: just manyT
s contiguously in memory. Super simple. This also means both exhibit the same caching-behavior (specifically, being very cache-friendly).The only difference is that a
Vec
might have more capacity than elements. SoVec
itself stores(pointer, length, capacity)
. That is one word more than a simple (boxed) slice (which stores(pointer, length)
). A boxed array doesn't need to store the length, as it's already in the type, so it is just a simple pointer. Whether or not we store one, two or three words is not really important when you will have millions of elements anyway.Accessing one element is the same for all three: we do a bounds check first and then calculate the target pointer via
base_pointer + size_of::<T>() * index
. But it's important to note that the array storing its length in the type means that the bounds check can be removed more easily by the optimizer! This can be a real advantage.However, bounds checks are already usually removed by the smart optimizer. In the benchmark code I posted above, there are no bounds checks in the assembly. So while a boxed array could be a bit faster by removed bounds checks, (a) this will be a minor performance difference and (b) it's very unlikely that you will have a lot of situations where the bound check is removed for the array but not for the Vec/slice.