my problem is that when I implement my design using Xilinx ISE 14.7 + XPS I often obtain a very different number of analyzed paths in the static timing analysis, also having very few differences in the .vhd files. In particular, the only file that I change (or that I think to change...) is something like:
entity my_entity is(
...
data_in : in std_logic_vector(N*B-1 downto 0);
...
);
end entity my_entity;
architecture bhv of my_entity is
signal data : std_logic_vector(B-1 downto 0);
signal idx_vect : std_logic_vector(log2(N)-1 downto 0);
signal idx : integer range 0 to N-1;
...
begin
process(clk)
begin
if(rising_edge(clk))then
idx_vect <= idx_vect + 1;
end if;
end process;
idx <= to_integer(unsigned(idx_vect));
data <= data_in((idx+1)*B-1 downto idx*B);
end architecture bhv;
I'm not sure the problem comes from here, but I'm not finding any other possible cause to a decrease of five times in the number of analyzed paths. Are there some syntax that one must avoid in order to obtain a correct implementation? Is it possible that indexing an array using an integer (as in the example codec) breaks up in some way the paths, making them not analyzed?
The code change is something like:
process(shift_reg, data_in)
for i in range 0 to N-1 loop
if(shift_reg(i) = '1')then
data <= data_in((i+1)*B-1 downto i*B);
end if;
end loop;
end process;
in which instead of increment idx_vect I have a circular one-hot shift register of N bits. Thanks in advance.
The coding style of the multiplexer at this line
can heavily influence the logic synthesis. This results in very different number of paths to analyze for timing.
The original multiplexer
I first checked the synthesis of the above line using this small example:
If one synthesizes this for a Spartan-6, XST reports this (excerpt):
Thus, no multiplexer was detected and the timing analyzer has to analyze a huge number of paths. The logic utilization is ok.
Optimized implementation
The same multiplexing can be achieved with: (EDIT: bugfix and simplification)
Now, the XST report looks much better:
It detects that for each output-bit a 128-to-1 multiplexer is required. The optimized synthesis of such a wide multiplexer is built-in to the synthesis tool. The number of LUTs is only reduced slightly. But, the number of paths to be processed by the timing analyzer is reduced dramatically by a factor of 20!
Implementation using one-hot selector
The above examples use a binary-encoded selector signal. I checked also the variant with the one-hot encoded one:
Now, the XST report is different again:
2-to-1 multiplexer are detected, because a priority mux analog to this scheme was described:
I have not used
elsif
here for didactical reasons. Eachif-else
stage is a 32-bit wide 2-to-1 mutiplexer. The problem here is, that the synthesis does not know, thats
is a one-hot encoded signal. Thus, a little more logic is required as in my optimized implementation.The number of paths to analyze for timing changes again significantly. The number is 10 times lower than in the original implementation, but 2 times higher than in my optimized one.