I've read the article on Wikipedia on the Duff's device, and I don't get it. I am really interested, but I've read the explanation there a couple of times and I still don't get it how the Duff's device works.
What would a more detailed explanation be?
1: Duffs device is a particular implimentation of loop unrolling. What is loop unrolling?
If you have an operation to perform N times in a loop you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:
with
Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.
2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last run through the loop (the 'N/n+1'th time) the case kicks in and we jump to the N % n case and run the loop code the 'remainder' number of times.
The point of duffs device is to reduce the number of comparisons done in a tight memcpy implementation.
Suppose you want to copy 'count' bytes from a to b, the straight forward approach is to do the following:
How many times do you need to compare count to see if it's a above 0? 'count' times.
Now, the duff device uses a nasty unintentional side effect of a switch case which allows you to reduce the number of comparisons needed to count / 8.
Now suppose you want to copy 20 bytes using duffs device, how many comparisons would you need? Only 3, since you copy eight bytes at a time except the
lastfirst one where you copy just 4.UPDATED: You don't have to do 8 comparisons/case-in-switch statements, but it's reasonable a trade-off between function size and speed.
The explanation in Dr. Dobb's Journal is the best that I found on the topic.
This being my AHA moment:
becomes:
becomes:
When I read it for the first time, I autoformatted it to this
and I had no idea what was happening.
Maybe not when this question was asked, but now Wikipedia has a very good explanation
There are some good explanations elsewhere, but let me give it a try. (This is a lot easier on a whiteboard!) Here's the Wikipedia example with some notations.
Let's say you're copying 20 bytes. The flow control of the program for the first pass is:
Now, start the second pass, we run just the indicated code:
Now, start the third pass:
20 bytes are now copied.
Note: The original Duff's Device (shown above) copied to an I/O device at the
to
address. Thus, it wasn't necessary to increment the pointer*to
. When copying between two memory buffers you'd need to use*to++
.Here's a non-detailed explanation which is what I feel to be the crux of Duff's device:
The thing is, C is basically a nice facade for assembly language (PDP-7 assembly to be specific; if you studied that you would see how striking the similarities are). And, in assembly language, you don't really have loops - you have labels and conditional-branch instructions. So the loop is just a part of the overall sequence of instructions with a label and a branch somewhere:
and a switch instruction is branching/jumping ahead somewhat:
In assembly it's easily conceivable how to combine these two control structures, and when you think of it that way, their combination in C doesn't seem so weird anymore.