在升压访问元素的最快方法的MultiArray(Fastest method of accessin

2019-10-20 02:50发布

什么是快 - 采用元素选择运营商访问多阵列的元素,或者穿越过使用迭代器的多阵列?

就我而言,我需要做一个全面检查过的多阵列每次的所有元素。

Answer 1:

访问的每一个元素的最快方法boost::multi_array是通过data()num_elements()

data()您访问底层的原始存储(包含该数组的数据的连续块),所以没有需要多个索引计算(亦认为multi_array可以从不同于0碱基索引数组,这是一个进一步并发症)。

一个简单的测试,得出:

g++ -O3 -fomit-frame-pointer -march=native   (GCC v4.8.2)
Writing (index): 9.70651
Writing (data):  2.22353
Reading (index): 4.5973 (found 1)
Reading (data):  3.53811 (found 1)

clang++ -O3 -fomit-frame-pointer -march=native   (CLANG v3.3)
Writing (index): 5.49858
Writing (data):  2.13678
Reading (index): 5.07324 (found 1)
Reading (data):  2.55109 (found 1)

默认情况下升压接入方式执行范围检查。 如果提供的索引超出一个阵列限定的范围之外时,断言将中止该程序。 要禁用范围检查,你可以定义BOOST_DISABLE_ASSERTS包括之前的预处理器宏multi_array.hpp在您的应用程序。

这将减少很多性能上的差异:

g++ -O3 -fomit-frame-pointer -march=native   (GCC v4.8.2)
Writing (index): 3.15244
Writing (data):  2.23002
Reading (index): 1.89553 (found 1)
Reading (data):  1.54427 (found 1)

clang++ -O3 -fomit-frame-pointer -march=native   (CLANG v3.3)
Writing (index): 2.24831
Writing (data):  2.12853
Reading (index): 2.59164 (found 1)
Reading (data):  2.52141 (found 1)

性能差增大(即data()更快):

  • 具有较高的维数;
  • 用更少的元件(大量访问元件不会是作为缓存压力到那些元素加载到CPU高速缓存作为显著元件。该预取将要坐在那里试图加载那些元素,这是要采取的一大部份时间)。

反正这种优化是不太可能在实际的程序中可测量的差异。 你不应该担心这个,除非你已经完全确定,经过广泛的测试,它是某种瓶颈的根源。

资源:

#include <chrono>
#include <iostream>

// #define BOOST_DISABLE_ASSERTS
#include <boost/multi_array.hpp>

int main()
{
  using array3 = boost::multi_array<unsigned, 3>;
  using index = array3::index;

  using clock = std::chrono::high_resolution_clock;
  using duration = std::chrono::duration<double>;

  constexpr unsigned d1(300), d2(400), d3(200), sup(100);

  array3 A(boost::extents[d1][d2][d3]);

  // Writing via index
  const auto t_begin1(clock::now());
  unsigned values1(0);
  for (unsigned n(0); n < sup; ++n)
    for (index i(0); i != d1; ++i)
      for (index j(0); j != d2; ++j)
        for (index k(0); k != d3; ++k)
          A[i][j][k] = ++values1;
  const auto t_end1(clock::now());

  // Writing directly
  const auto t_begin2(clock::now());
  unsigned values2(0);
  for (unsigned n(0); n < sup; ++n)
  {
    const auto sup(A.data() + A.num_elements());

    for (auto i(A.data()); i != sup; ++i)
      *i = ++values2;
  }
  const auto t_end2(clock::now());

  // Reading via index
  const auto t_begin3(clock::now());
  bool found1(false);
  for (unsigned n(0); n < sup; ++n)
    for (index i(0); i != d1; ++i)
      for (index j(0); j != d2; ++j)
        for (index k(0); k != d3; ++k)
          if (A[i][j][k] == values1)
            found1 = true;
  const auto t_end3(clock::now());

  // Reading directly
  const auto t_begin4(clock::now());
  bool found2(false);
  for (unsigned n(0); n < sup; ++n)
  {
    const auto sup(A.data() + A.num_elements());

    for (auto i(A.data()); i != sup; ++i)
      if (*i == values2)
        found2 = true;
  }
  const auto t_end4(clock::now());

  std::cout << "Writing (index): "
            << std::chrono::duration_cast<duration>(t_end1 - t_begin1).count()
            << std::endl
            << "Writing (data):  "
            << std::chrono::duration_cast<duration>(t_end2 - t_begin2).count()
            << std::endl
            << "Reading (index): "
            << std::chrono::duration_cast<duration>(t_end3 - t_begin3).count()
            << " (found " << found1 << ")" << std::endl
            << "Reading (data):  "
            << std::chrono::duration_cast<duration>(t_end4 - t_begin4).count()
            << " (found " << found2 << ")" << std::endl;

  return 0;
}


文章来源: Fastest method of accessing elements in Boost MultiArray