什么是面向数据设计?(What is data oriented design?)

2019-09-02 04:00发布

我正在读这篇文章 ,这家伙继续谈论如何每个人都可以大大受益于面向设计与面向对象的数据混合受益。 他没有表现出任何代码示例,但是。

我GOOGLE了这一点,并不能找到任何真实的信息来这是什么,更不用说任何代码样本。 是任何人都熟悉这个词,并能提供一个例子吗? 这也许对于别的东西有不同的定义?

Answer 1:

First of all don't confuse this with data driven design.

My understanding of Data Oriented Design is that it is about organizing your data for efficient processing. Especially with respect to cache misses etc. Data Driven Design on the other hand is about letting data control a lot of your programs behavior (described very well by Andrew Keith's answer).

Say you have ball objects in your application with properties such as color, radius, bounciness, position etc.

Object Oriented Approach

In OOP you would describe you balls like this:

class Ball {
  Point  position;
  Color  color;
  double radius;

  void draw();
};

And then you would create a collection of balls like this:

vector<Ball> balls;

Data Oriented Approach

In Data Oriented Design however you are more likely to write the code like this:

class Balls {
  vector<Point>  position;
  vector<Color>  color;
  vector<double> radius;

  void draw();
};

As you can see there is no single unit representing one Ball anymore. Ball objects only exist implicitly.

This can have many advantages performance wise. Usually we want to do operations on many balls at the same time. Hardware usually wants large continuous chunks of memory to operate efficiently.

Secondly you might do operations that affects only part of a balls properties. E.g. if you combine the colors of all the balls in various ways, then you want your cache to only contain color information. However when all ball properties are stored in one unit you will pull in all the other properties of a ball as well. Even though you don't need them.

Cache Usage Example

Say a ball each ball takes up 64 bytes and a Point takes 4 bytes. A cache slot takes say 64 bytes as well. If I want to update the position of 10 balls I have to pull in 10*64 = 640 bytes of memory into cache and get 10 cache misses. If however I can work the positions of the balls as separate units, that will only take 4*10 = 40 bytes. That fits in one cache fetch. Thus we only get 1 cache miss to update all the 10 balls. These numbers are arbitrary I assume a cache block is bigger.

But it illustrates how memory layout can have severe effect cache hits and thus performance. This will only increase in importance as the difference between CPU and RAM speed widens.

How to layout the memory

In my ball example I simplified the issue a lot, because usually for any normal app you will likely access multiple variables together. E.g. position and radius will probably be used together frequently. Then your structure should be:

class Body {
  Point  position;
  double radius;
};

class Balls {
  vector<Body>  bodies;
  vector<Color>  color;

  void draw();
};

The reason you should do this is that if data used together are placed in separate arrays, there is a risk that they will compete for the same slots in the cache. Thus loading one will throw out the other.

So compared to Object Oriented programming the classes you end up making are not related to the entities in your mental model of the problem. Since data is lumped together based on data usage, you won't always have sensible names to give your classes in Data Oriented Design.

Relation to relational databases

The thinking behind Data Oriented Design is very similar to how you think about relational databases. Optimizing a relational database can also involve using the cache more efficient, although in this case, the cache is not CPU cache put pages in memory. A good data base designer will also likely split out infrequently accessed data into a separate table rather than creating a table with huge number of columns were only a few of the columns are ever used. He might also choose to denormalize some of the tables so that data don't have to be accessed from multiple locations on disk. Just like with Data Oriented Design these choices are made by looking at what the data access patterns are and where the performance bottleneck is.



Answer 2:

迈克·阿克顿发表公开谈论数据化的设计最近:

我对它的基本概括是:如果你想表现的话,想想数据流,发现是最有可能与您的螺丝和优化很难存储层 迈克的重点是L2高速缓存未命中,因为他正在做实时的,但我想同样的事情适用于数据库(磁盘读取),甚至在Web(HTTP请求)。 这是做系统编程的一个有用的方法,我想。

需要注意的是它不从思考算法和时间复杂度赦免你,它只是集中你的注意力在搞清楚,你必须再与你疯了CS的技能针对最昂贵的操作类型。



Answer 3:

我只是想指出,诺埃尔是关于一些我们在游戏开发所面临的具体需求而说话。 我想,在做实时仿真软将受益于此其他部门,但它不可能是一个技术,将显示出显着的改善一般企业用途。 这种设置对于确保性能的任何一个角落都被挤掉了底层硬件。



Answer 4:

一个面向数据的设计是在该应用程序的逻辑被建立的数据集的,代替程序的算法的设计。 例如

程序的方法。

int animation; // this value is the animation index

if(animation == 0)
   PerformMoveForward();
else if(animation == 1)
  PerformMoveBack();
.... // etc

数据设计方法

typedef struct
{
   int Index;
   void (*Perform)();
}AnimationIndice;

// build my animation dictionary
AnimationIndice AnimationIndices[] = 
  {
      { 0,PerformMoveForward }
      { 1,PerformMoveBack }
  }

// when its time to run, i use my dictionary to find my logic
int animation; // this value is the animation index
AnimationIndices[animation].Perform();

像这样的数据的设计促进数据的使用来构建应用程序的逻辑。 它更容易在其中可能有数以千计的基于动画或其他一些因素的逻辑路径视频游戏特别管理。



文章来源: What is data oriented design?