I'm new to FPGA and HDL but I'm trying to learn and cant figure this out. How can I calculate or estimate propagation delay though several levels of combination logic. Can I only determine this empirically or can I figure it out at design time. In this situation I'm using and FPGA to implement a parity setting and checking circuit. The circuit would look like a tree network of xor gates like the example pictures, except I intent xor 16 registers so there will be more levels or xor operations. I would like to be able to calculate the propagation delay though each "level" xor logic so I can determine how many fractions of clock cycles or how many nanoseconds the entire parity checking and setting operations will take. Hope I'm making sense.
Thanks a lot for the help.
You need "The Knowledge" as I explain here in "The Art of High Performance FPGA Design". http://www.fpgacpu.org/log/aug02.html#art "You have to ... crank up your tools and design some test circuits, and then open up the timing analyzer and the FPGA editor and pour over what came out, what the latencies (logic and routing) tend to be, etc."
After you do that for a while, you will look at this kind of question, and just know (or have a pretty good idea).
In this case, for example, I know in an FPGA, a 16-input XOR will be built out of a tree of 4- or 6-input lookup tables (4-LUTs or 6-LUTs) two deep, and it cannot be implemented in circuit only one LUT deep. Therefore the minimum delay for such a circuit in a pipelined implementation is going to be (in Xilinx timing nomenclature):
tCKO -- clock to output delay of any of the 16-flip-flops
tILO -- delay through the first level LUTs
tAS -- delay through 2nd level of LUTS + flip-flop setup time assuming implemented in the same slice
- plus net routing delays
and for Virtex-6 speed -1 I would expect this to be ~1.5 ns.
As others have said, the component switching delay data is in the data sheets for your device in question, but the net routing delays are not. Indeed, in time, you may even start to remember the key delays and develop a sense for how many FPGA primitives like LUTs you can use and still make a particular clock period / clock frequency target.
Anyway I just tried this with some throwaway Verilog I coded up:
module t(clk, i, o);
input clk;
input [15:0] i;
output reg o;
reg [15:0] d;
always @(posedge clk) begin
d <= i;
o <= ^d;
end
endmodule
and a simple UCF file:
net clk period = 1.5 ns;
and the total delay in my device was about 1.4 ns. Try it for yourself and see!
Here is one path from the static timing analyzer output:
Paths for end point o (SLICE_X3Y68.A5), 6 paths
--------------------------------------------------------------------------------
Slack (setup path): 0.198ns (requirement - (data path - clock path skew + uncertainty))
Source: d_13 (FF)
Destination: o (FF)
Requirement: 1.500ns
Data Path Delay: 1.248ns (Levels of Logic = 2)
Clock Path Skew: -0.019ns (0.089 - 0.108)
Source Clock: clk_BUFGP rising at 0.000ns
Destination Clock: clk_BUFGP rising at 1.500ns
Clock Uncertainty: 0.035ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
Total System Jitter (TSJ): 0.070ns
Total Input Jitter (TIJ): 0.000ns
Discrete Jitter (DJ): 0.000ns
Phase Error (PE): 0.000ns
Maximum Data Path at Slow Process Corner: d_13 to o
Location Delay type Delay(ns) Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
SLICE_X3Y67.BQ Tcko 0.337 d<15>
d_13
SLICE_X2Y68.A2 net (fanout=1) 0.590 d<13>
SLICE_X2Y68.A Tilo 0.068 d<11>
d[15]_reduce_xor_21_xo<0>1
SLICE_X3Y68.A5 net (fanout=1) 0.180 d[15]_reduce_xor_21_xo<0>
SLICE_X3Y68.CLK Tas 0.073 d<10>
d[15]_reduce_xor_21_xo<0>3
o
------------------------------------------------- ---------------------------
Total 1.248ns (0.478ns logic, 0.770ns route)
(38.3% logic, 61.7% route)
As you can see, the logic delays from the datasheets are only about 480 ps whereas the net routing delays are 770 ns and clock skew etc. is a bit more, total under 1.3 ns. This is actually faster than a component switching limit / Fmax on the global clock tree of 700 MHz / 1.43 ns...
So in summary, as you try some test circuits, and trying tuning them, you will get experience that helps you estimate how fast your circuit will run when implemented in FPGA primitives like LUTs.
And if it really matters, there is no substite for implementing the design through synthesis, place-and-route, and static timing analysis. Don't forget to add timing constraints to give the tools something to target, and then experiment lowering the min clock period iteratively until you converge on a min period.
Happy hacking!
You can estimate the propagation delays through several stages of logic only if you have timinig models which provide delays as a function of temperature, supply voltage and manufacturing process variation for all of your components. In the IC world, this is done automatically using static timinig analysis tools. I'm not sure about FPGA design methodologies.
As Oli Charlesworth mentions, the overall delay also depends on interconnect wire delays. Other factors are: input drive strength and output load.
Theoretically it is possible to get the propgation delays in and FPGA without coding, but it is not going to be easy.
the easiest way to do so is to create a very simple project with the IO signals you need, write the code in VHDL, Verilog or even using schematic capture, synthesize and route the design and then look into the report file generated by the tool to see the actual delays.
To understand some of the parameters in the report file, you can look into the "DC and Switching Characteristics" document provided by all FPGA companies. For example, for Spartan 6 family devices from Xilinx, it is: http://www.xilinx.com/support/documentation/data_sheets/ds162.pdf
Hope this helps,
/Farhad
It's the kind of thing you get a feel for as you do more coding on a particular platform, but it's part of the art of being a good RTL engineer.
As you write your code, put it through both simulation and synthesis. Make sure you understand the timing paths that the synthesis tool reports, and have a good mental image of the logic you're describing. If you find yourself hugely out with respect to timing, then you need to re-think your design, but do this early. There's nothing worse than spending time on a design, getting it working and passing all it's tests, just to find out it's not fast enough.
Then you change your target FPGA or technology library, and you have to readjust all your expectations.