This is a follow-on question from How can I iteratively create buses of parameterized size to connect modules also iteratively created?. The answer is too complex to answer in a comment and the solution may be helpful for other SOs. This question is following the self-answer format. Addition answer are encouraged.
The following code works and uses a bi-directional array.
module Multiplier #(parameter M = 4, parameter N = 4)(
input [M-1:0] A, //Input A, size M
input [N-1:0] B, //Input B, size N
output [M+N-1:0] P ); //Output P (product), size M+N
wire [M+N-1:0] PP [N-1:0]; // Partial Product array
assign PP[0] = { {N{1'b0}} , { A & {M{B[0]}} } }; // Pad upper bits with 0s
assign P = PP[N-1]; // Product
genvar i;
generate
for (i=1; i < N; i=i+1)
begin: addPartialProduct
wire [M+i-1:0] gA,gB,gS; wire Cout;
assign gA = { A & {M{B[i]}} , {i{1'b0}} };
assign gB = PP[i-1][M+i-1:0];
assign PP[i] = { {(N-i){1'b0}}, Cout, gS}; // Pad upper bits with 0s
RippleCarryAdder#(M+i) adder( .A(gA), .B(gB), .S(gS), .Cin(1'b0), .* );
end
endgenerate
endmodule
Some of the bits are never used, such as PP[0][M+N-1:M+1]
. A synthesizer will usually remove these bits during optimization and possibly give a warning. Some synthesizers are not advance enough to do this correctly. To resolve this, the designer must implement extra logic. In this example the parameter for all the RippleCarryAdder's would be set to M+N
. The extra logic wastes area and potently degrades performance.
How can the unused bits be safely eliminated? Can multidimensional arrays with different dimensions be used? Will the end code be readable and debug-able?
Short answer, NO.
Verilog does not support unique sized multidimensional arrays. SystemVerilog does support dynamic arrays however these cannot be connected to module ports and cannot be synthesized.
Embedded code (such as Perl's EP3, Ruby's eRuby/ruby_it, Python's prepro, etc.) can generate custom denominational arrays and code iterations, but the parameters must be hard coded before compile. The final value of any parameter of a given instance is discoverer during compile time, well after the embedded script is ran. The parameter must be treated as a global constant, therefore
Multiplier#(4,4)
andMultiplier#(8,8)
cannot exist in the same project unless to teach the script how to extract the full hierarchy and parameters of the project. (Good luck coding and maintaining that).If the synthesizer is not advance enough to exclude unused bits on its own, then the bits can be optimized by flattening the multidimensional array into a one-dimensional array with intelligent part-select. The trick is finding the equation which can be achieved by following these steps:
lsb
index for each part part select:M
is 4, thelsb
for each part-select are0, 5, 11, 18, 26, 35, ...
. Plug this pattern into WolframAlpha to find the equationa(n) = (n-1)*(n+8)/2
.M
equal to 3 for the pattern0, 4, 9, 15, ...
to get equationa(n)=(n-1)*(n+6)/2
M
equal to 5 for the pattern0, 6, 13, 21, 30, ...
to get equationa(n)=(n-1)*(n+10)/2
.M
andN
is linear (i.e. multiple; no exponential, logarithmic, etc.), only two equations are needed to create a variable parameterM
equation. For non-linear equations more data-point equations are recommended. In this case note that forM=3,4,5
the pattern(n+6),(n+8),(n+10)
, therefore the generic equation can be derived to:lsb(n)=(n-1)*(n+2*M)/2
msb
index for each part select:lsb
(ends up beingmsb(n)=(n**2+(M*2+1)*n-2)/2
). Or define themsb
in terms oflsb
:msb(n)=lsb(n+1)-1
IEEE std 1364-2001 (Verilog 2001) introduced macros with arguments and indexed part-select; see § 19.3.1 '`define' and § 4.2.1 'Vector bit-select and part-select addressing' respectively. Or see IEEE std 1800-2012 § 22.5.1 '`define' and § 11.5.1 'Vector bit-select and part-select addressing' respectively. This answer assumes that these features are supported by the SO's simulator and synthesizer since the
generate
keyword was also introduced in IEEE std 1364-2001, see § 12.1.3 'Generated instantiation' (and IEEE std 1800-2012 § 27. 'Generate constructs'). For tools that are not fully support IEEE std 1364-2001, see`ifdef
examples provided here.Since the functions to calculate the part-select ranges are frequently used, use
`define
macros with arguments. This will help prevent copy/paste bugs. The extra sets of()
in the macro definitions are to insure proper order of operations. It is also a good idea to`undef
the macros at the end of the module definition, preventing the global space from getting polluted. With the flattened array it may become challenging to debug. By defining pass-through connections within the generate block's for-loop the signal can become readable and can be probed in waveform.Working example with side-by-side and test bench: http://www.edaplayground.com/s/6/591
Yes, for anyone who has already learned how to properly use the generate construct. The generate block's for-loop defines local wires which are confined to scope of the loop index.
gA
form loop-0 andgA
from loop-1 are unique signals and cannot interact with each other. The local signals can be probed in waveform which is great for debugging.