可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, "00000100" (with the leftmost bit being #7 and rightmost #0) means that slave #2 is scheduled.
Now, I want to pick the next scheduled slave in a round-robin scheduling scheme, with a twist. I have a "request mask" which says which slaves actually want to be scheduled. The next slave will be picked only from those that want to.
Some examples (assume round-robin scheduling is done by rotating left).
Example1:
- Current: "00000100"
- Mask: "01100000"
- Next schedule: "00100000" - in normal round-robin, #3 and then #4 should come after #2, but they don't request, so #5 is picked.
Example2:
- Current: "01000000"
- Mask: "00001010"
- Next: "00000010" - because scheduling is done by cycling left, and #1 is the first requesting slave in that order.
Now, this can be easily coded in a loop, I know. But I actually want to get my result by a bit-twiddling operation, without loops. The motivation: I want to implement this in hardware (in an FPGA) in VHDL/Verilog.
A bonus is to make up an algorithm that's generic for any amount of slaves N.
By the way, this is not a homework question. It's an important problem whenever one wants to schedule slaves in some manner, and condition the scheduling by the slaves' requests. My current solution is somewhat "heavy" and I wanted to know if I'm missing something obvious.
回答1:
A loop does not have to be bad.
I would simply do
current[i] = current[i-1] & mask[i] | // normal shift logic
mask[i] & current[i-2] & !mask[i-1] | // here build logic
... // expression for
// remaining
And then put it into a generate loop (ie it will get unrolled into hardware), which will produce parallel hardware for the expressions.
Other here mentioned solutions use multiple "-". I can only discourage them, as this will get you a really expensive operation. Esp. in one hot you can get easily more than > 32 bits, which will not easily be implementable in HW, as the borrow has to go through all bits (the deadicated carry logic on certain fpgas make it approachable for small number of bits).
回答2:
I've found the following Verilog code for implementing the task in the Altera advanced synthesis cookbook.
// 'base' is a one hot signal indicating the first request
// that should be considered for a grant. Followed by higher
// indexed requests, then wrapping around.
//
module arbiter (
req, grant, base
);
parameter WIDTH = 16;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;
input [WIDTH-1:0] base;
wire [2*WIDTH-1:0] double_req = {req,req};
wire [2*WIDTH-1:0] double_grant = double_req & ~(double_req-base);
assign grant = double_grant[WIDTH-1:0] | double_grant[2*WIDTH-1:WIDTH];
endmodule
It uses subtraction (only once, though), so conceptually it's quite similar to Doug's solution.
回答3:
The following solution works for any number of slaves (K), and is O(n) in your FPGA. For each bit in the field, you will require three logic gates and two inverters. I tested out the concept with a basic logic simulator, and it works.
The chain of logic gates between current and mask essentially creates a priority system that favors bits "lower down" in the chain. This chain is looped at the ends, but the current bits are used to break the chain.
To visualize the operation, imagine that bit 3 is set in the current field, and follow the signal downwards in the diagram. The logical one at bit 3 places a logical zero at the input to the first AND gate, which guarantees that the output of that AND gate will also be zero (this is where the OR-gate chain is broken). The zero at the output of the first AND gate places a one at the input to the second AND gate. This makes bit 2 of next directly dependent on bit 2 of mask.
Now, the chain of OR gates comes into play.
If bit 2 of mask was set, the logical output of the OR gate directly to the left of it will also be a one, which will place a logical one at the input to the AND gate below bit 2 of current (which will be zero, since only one bit in current can be set at a time). The logical one at the output of the top AND gate places a logical zero at the input of the bottom AND gate, thus setting bit 1 of next equal to zero.
If bit 2 of mask was not set, both inputs to the OR gate would be zero, so the output of the AND gate below bit 2 of current would be a zero, placing a one at the input to the bottom AND gate, and therefore making bit 1 of next dependent on bit 1 of mask.
This logic follows the chain of OR gates "up" the bits, looping around from the left side back over to the right, ensuring that only one bit in next can be set to a one. The loop stops once it makes its way back to bit 3 of current, as a result of that bit being set. This prevents the circuit from staying in a perpetual loop.
I have no experience with Verilog or VHDL, so I'll leave the actual code up to you and the rest of stackoverflow.
alt text http://img145.imageshack.us/img145/5125/bitshifterlogicdiagramkn7.jpg
notes:
- This solution is only partial. It will still require some kind of latching mechanism to hold the bit fields.
- Keep in mind that as you increase the number of bits, the time required for the gate voltages to settle will also increase.
- There will have to be some logic in place to handle the case where the current field is equal to zero. See this stackoverflow question.
回答4:
Interesting problem! I can't help but wonder if you can't simplify your scheduler operation so this sort of operation would be necessary.
Given that you know VHDL, I won't go into detail, but my suggestion would be the following:
Use a 3 bit encoder to turn the currently scheduled task into a number:
01000000 --> 6
Then use a barrel shifter to rotate the mask by that number + 1 (to skip the current task):
00001010 --> 00010100
Then use a priority encoder to find the first available "next" task:
00010100 --> 00000100 --> 2
Then reverse the barrel shift by addition:
(2+7) % 8 = 1
Which when re-encoded will give the next scheduled task:
00000010
Should be very fast and straightforward, although the barrel shifter is 'expensive' in terms of realestate, but I don't see an easy way to get around that at the moment.
Edit: Doug's solution is significantly more elegant...
-Adam
回答5:
Subracting 1 is the essential idea here. It's used to cascade borrows through the bits to find the next task.
bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current)
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo
This will use a loop internally though...
回答6:
Assuming twos complement representation, call your two words mask
and current
, in C:
mask_lo = (current << 1) - 1; // the bits to the right and including current
mask_hi = ~mask_lo; // the bits to the left of current
// the left bits, otherwise right:
next = (mask & mask_hi) ? (mask & mask_hi) : (mask & mask_lo);
return (next & -next); // the least significant bit set
回答7:
This should do what you want:
number_of_tasks= <number of tasks, in the example this is 8>
next_mask= current | (current - 1);
next_barrel= next | (next << number_of_tasks);
next_barrel&= ~number_of_tasks;
next_barrel&= -next_barrel;
next_barrel|= next_barrel >> number_of_tasks;
next_task_mask= next_barrel & -next_barrel;
Basically, duplicate the bits of the next task mask, mask off the bits we don't want to consider, find the lowest set bit, fold the high bits back in, then take the lowest bit set. This runs in constant time.
Edit: Updating to take into account current == 00010000 and next_mask == 00111000
回答8:
Untested, but off the top of my head, I'd be surprised if this didn't produce ma reasonable synthesis... Has the advantage of being relatively readable (to me anyway) unlike typical bit-twiddling hacks.
for i in current'range loop
current := rotate_left(current, 1);
if or_reduce(mask and current) = '1' then
current:= mask and current;
end if;
end loop;
回答9:
Complete parametrizable arbiter implementation that can be configured for round-robin or priority arbitration:
https://github.com/alexforencich/verilog-axis/blob/master/rtl/arbiter.v
This design uses a pair of priority encoders to select the next output in the sequence. The priority encoders used are implemented efficiently as trees.