What is faster - accessing elements of a multiarray using element selection operator, or traversing over the multiarray using iterators?
In my case, I need to do a full pass over all elements of multiarray each time.
What is faster - accessing elements of a multiarray using element selection operator, or traversing over the multiarray using iterators?
In my case, I need to do a full pass over all elements of multiarray each time.
The fastest way to access every element of a boost::multi_array
is via data()
and num_elements()
.
With data()
you gain access to the underlying raw storage (a contiguous block that contains the array‘s data) so there isn't need for multiple index calculations (consider also that multi_array
can index arrays from bases different from 0 and this is a further complication).
A simple test gives:
g++ -O3 -fomit-frame-pointer -march=native (GCC v4.8.2)
Writing (index): 9.70651
Writing (data): 2.22353
Reading (index): 4.5973 (found 1)
Reading (data): 3.53811 (found 1)
clang++ -O3 -fomit-frame-pointer -march=native (CLANG v3.3)
Writing (index): 5.49858
Writing (data): 2.13678
Reading (index): 5.07324 (found 1)
Reading (data): 2.55109 (found 1)
By default boost access methods perform range checking. If a supplied index is out of the range defined for an array, an assertion will abort the program. To disable range checking, you can define the BOOST_DISABLE_ASSERTS
preprocessor macro prior to including multi_array.hpp
in your application.
This will reduce a lot the performance difference:
g++ -O3 -fomit-frame-pointer -march=native (GCC v4.8.2)
Writing (index): 3.15244
Writing (data): 2.23002
Reading (index): 1.89553 (found 1)
Reading (data): 1.54427 (found 1)
clang++ -O3 -fomit-frame-pointer -march=native (CLANG v3.3)
Writing (index): 2.24831
Writing (data): 2.12853
Reading (index): 2.59164 (found 1)
Reading (data): 2.52141 (found 1)
Performance difference increases (i.e. data()
is faster):
Anyway this kind of optimization is unlikely to make a measurable difference in a real program. You shouldn't worry about this unless you've conclusively determined, through extensive testing, that it is the source of some sort of bottleneck.
Source:
#include <chrono>
#include <iostream>
// #define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>
int main()
{
using array3 = boost::multi_array<unsigned, 3>;
using index = array3::index;
using clock = std::chrono::high_resolution_clock;
using duration = std::chrono::duration<double>;
constexpr unsigned d1(300), d2(400), d3(200), sup(100);
array3 A(boost::extents[d1][d2][d3]);
// Writing via index
const auto t_begin1(clock::now());
unsigned values1(0);
for (unsigned n(0); n < sup; ++n)
for (index i(0); i != d1; ++i)
for (index j(0); j != d2; ++j)
for (index k(0); k != d3; ++k)
A[i][j][k] = ++values1;
const auto t_end1(clock::now());
// Writing directly
const auto t_begin2(clock::now());
unsigned values2(0);
for (unsigned n(0); n < sup; ++n)
{
const auto sup(A.data() + A.num_elements());
for (auto i(A.data()); i != sup; ++i)
*i = ++values2;
}
const auto t_end2(clock::now());
// Reading via index
const auto t_begin3(clock::now());
bool found1(false);
for (unsigned n(0); n < sup; ++n)
for (index i(0); i != d1; ++i)
for (index j(0); j != d2; ++j)
for (index k(0); k != d3; ++k)
if (A[i][j][k] == values1)
found1 = true;
const auto t_end3(clock::now());
// Reading directly
const auto t_begin4(clock::now());
bool found2(false);
for (unsigned n(0); n < sup; ++n)
{
const auto sup(A.data() + A.num_elements());
for (auto i(A.data()); i != sup; ++i)
if (*i == values2)
found2 = true;
}
const auto t_end4(clock::now());
std::cout << "Writing (index): "
<< std::chrono::duration_cast<duration>(t_end1 - t_begin1).count()
<< std::endl
<< "Writing (data): "
<< std::chrono::duration_cast<duration>(t_end2 - t_begin2).count()
<< std::endl
<< "Reading (index): "
<< std::chrono::duration_cast<duration>(t_end3 - t_begin3).count()
<< " (found " << found1 << ")" << std::endl
<< "Reading (data): "
<< std::chrono::duration_cast<duration>(t_end4 - t_begin4).count()
<< " (found " << found2 << ")" << std::endl;
return 0;
}