How does Duff's device work?

2018-12-31 08:36发布

I've read the article on Wikipedia on the Duff's device, and I don't get it. I am really interested, but I've read the explanation there a couple of times and I still don't get it how the Duff's device works.

What would a more detailed explanation be?

9条回答
心情的温度
2楼-- · 2018-12-31 08:47

1: Duffs device is a particular implimentation of loop unrolling. What is loop unrolling?
If you have an operation to perform N times in a loop you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:

for (int i=0; i<N; i++) {
    // [The loop code...] 
}

with

for (int i=0; i<N/n; i++) {
    // [The loop code...]
    // [The loop code...]
    // [The loop code...]
    ...
    // [The loop code...] // n times!
}

Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.

2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last run through the loop (the 'N/n+1'th time) the case kicks in and we jump to the N % n case and run the loop code the 'remainder' number of times.

查看更多
君临天下
3楼-- · 2018-12-31 08:48

The point of duffs device is to reduce the number of comparisons done in a tight memcpy implementation.

Suppose you want to copy 'count' bytes from a to b, the straight forward approach is to do the following:

  do {                      
      *a = *b++;            
  } while (--count > 0);

How many times do you need to compare count to see if it's a above 0? 'count' times.

Now, the duff device uses a nasty unintentional side effect of a switch case which allows you to reduce the number of comparisons needed to count / 8.

Now suppose you want to copy 20 bytes using duffs device, how many comparisons would you need? Only 3, since you copy eight bytes at a time except the last first one where you copy just 4.

UPDATED: You don't have to do 8 comparisons/case-in-switch statements, but it's reasonable a trade-off between function size and speed.

查看更多
姐姐魅力值爆表
4楼-- · 2018-12-31 08:50

The explanation in Dr. Dobb's Journal is the best that I found on the topic.

This being my AHA moment:

for (i = 0; i < len; ++i) {
    HAL_IO_PORT = *pSource++;
}

becomes:

int n = len / 8;
for (i = 0; i < n; ++i) {
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
    HAL_IO_PORT = *pSource++;
}

n = len % 8;
for (i = 0; i < n; ++i) {
    HAL_IO_PORT = *pSource++;
}

becomes:

int n = (len + 8 - 1) / 8;
switch (len % 8) {
    case 0: do { HAL_IO_PORT = *pSource++;
    case 7: HAL_IO_PORT = *pSource++;
    case 6: HAL_IO_PORT = *pSource++;
    case 5: HAL_IO_PORT = *pSource++;
    case 4: HAL_IO_PORT = *pSource++;
    case 3: HAL_IO_PORT = *pSource++;
    case 2: HAL_IO_PORT = *pSource++;
    case 1: HAL_IO_PORT = *pSource++;
               } while (--n > 0);
}
查看更多
看风景的人
5楼-- · 2018-12-31 08:50

When I read it for the first time, I autoformatted it to this

void dsend(char* to, char* from, count) {
    int n = (count + 7) / 8;
    switch (count % 8) {
        case 0: do {
                *to = *from++;
                case 7: *to = *from++;
                case 6: *to = *from++;
                case 5: *to = *from++;
                case 4: *to = *from++;
                case 3: *to = *from++;
                case 2: *to = *from++;
                case 1: *to = *from++;
            } while (--n > 0);
    }
}

and I had no idea what was happening.

Maybe not when this question was asked, but now Wikipedia has a very good explanation

The device is valid, legal C by virtue of two attributes in C:

  • Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement. In conjunction with the fact that, in the absence of a break statement, the flow of control will fall-through from a statement controlled by one case label to that controlled by the next, this means that the code specifies a succession of count copies from sequential source addresses to the memory-mapped output port.
  • The ability to legally jump into the middle of a loop in C.
查看更多
与君花间醉酒
6楼-- · 2018-12-31 08:51

There are some good explanations elsewhere, but let me give it a try. (This is a lot easier on a whiteboard!) Here's the Wikipedia example with some notations.

Let's say you're copying 20 bytes. The flow control of the program for the first pass is:

int count;                        // Set to 20
{
    int n = (count + 7) / 8;      // n is now 3.  (The "while" is going
                                  //              to be run three times.)

    switch (count % 8) {          // The remainder is 4 (20 modulo 8) so
                                  // jump to the case 4

    case 0:                       // [skipped]
             do {                 // [skipped]
                 *to = *from++;   // [skipped]
    case 7:      *to = *from++;   // [skipped]
    case 6:      *to = *from++;   // [skipped]
    case 5:      *to = *from++;   // [skipped]
    case 4:      *to = *from++;   // Start here.  Copy 1 byte  (total 1)
    case 3:      *to = *from++;   // Copy 1 byte (total 2)
    case 2:      *to = *from++;   // Copy 1 byte (total 3)
    case 1:      *to = *from++;   // Copy 1 byte (total 4)
           } while (--n > 0);     // N = 3 Reduce N by 1, then jump up
                                  //       to the "do" if it's still
    }                             //        greater than 0 (and it is)
}

Now, start the second pass, we run just the indicated code:

int count;                        //
{
    int n = (count + 7) / 8;      //
                                  //

    switch (count % 8) {          //
                                  //

    case 0:                       //
             do {                 // The while jumps to here.
                 *to = *from++;   // Copy 1 byte (total 5)
    case 7:      *to = *from++;   // Copy 1 byte (total 6)
    case 6:      *to = *from++;   // Copy 1 byte (total 7)
    case 5:      *to = *from++;   // Copy 1 byte (total 8)
    case 4:      *to = *from++;   // Copy 1 byte (total 9)
    case 3:      *to = *from++;   // Copy 1 byte (total 10)
    case 2:      *to = *from++;   // Copy 1 byte (total 11)
    case 1:      *to = *from++;   // Copy 1 byte (total 12)
           } while (--n > 0);     // N = 2 Reduce N by 1, then jump up
                                  //       to the "do" if it's still
    }                             //       greater than 0 (and it is)
}

Now, start the third pass:

int count;                        //
{
    int n = (count + 7) / 8;      //
                                  //

    switch (count % 8) {          //
                                  //

    case 0:                       //
             do {                 // The while jumps to here.
                 *to = *from++;   // Copy 1 byte (total 13)
    case 7:      *to = *from++;   // Copy 1 byte (total 14)
    case 6:      *to = *from++;   // Copy 1 byte (total 15)
    case 5:      *to = *from++;   // Copy 1 byte (total 16)
    case 4:      *to = *from++;   // Copy 1 byte (total 17)
    case 3:      *to = *from++;   // Copy 1 byte (total 18)
    case 2:      *to = *from++;   // Copy 1 byte (total 19)
    case 1:      *to = *from++;   // Copy 1 byte (total 20)
           } while (--n > 0);     // N = 1  Reduce N by 1, then jump up
                                  //       to the "do" if it's still
    }                             //       greater than 0 (and it's not, so bail)
}                                 // continue here...

20 bytes are now copied.

Note: The original Duff's Device (shown above) copied to an I/O device at the to address. Thus, it wasn't necessary to increment the pointer *to. When copying between two memory buffers you'd need to use *to++.

查看更多
低头抚发
7楼-- · 2018-12-31 08:52

Here's a non-detailed explanation which is what I feel to be the crux of Duff's device:

The thing is, C is basically a nice facade for assembly language (PDP-7 assembly to be specific; if you studied that you would see how striking the similarities are). And, in assembly language, you don't really have loops - you have labels and conditional-branch instructions. So the loop is just a part of the overall sequence of instructions with a label and a branch somewhere:

        instruction
label1: instruction
        instruction
        instruction
        instruction
        jump to label1  some condition

and a switch instruction is branching/jumping ahead somewhat:

        evaluate expression into register r
        compare r with first case value
        branch to first case label if equal
        compare r with second case value
        branch to second case label if equal
        etc....
first_case_label: 
        instruction
        instruction
second_case_label: 
        instruction
        instruction
        etc...

In assembly it's easily conceivable how to combine these two control structures, and when you think of it that way, their combination in C doesn't seem so weird anymore.

查看更多
登录 后发表回答