I want to implement the K&R algorithm for hamming weight calculation of 256 bit vector. I have written my code in vhdl as:
entity counter_loop is
Port ( dataIn : in STD_LOGIC_VECTOR (255 downto 0);
dataOut : out STD_LOGIC_VECTOR (8 downto 0);
threshold : in STD_LOGIC_VECTOR (8 downto 0);
clk : in STD_LOGIC;
flag : out STD_LOGIC);
end counter_loop;
architecture Behavioral of counter_loop is
signal val : STD_LOGIC_VECTOR (255 downto 0) := X"FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF";
begin
process (clk)
variable count : STD_LOGIC_VECTOR (8 downto 0):= "000000000";
begin
flag <= '0';
val <= dataIn;
--if(clk'event and clk = '1') then
while (val > 0) loop
count := count+1;
val <= (val and (val-1));
if (count > threshold) then
flag <= '1';
end if;
end loop;
dataOut <= count;
--end if;
end process;
end Behavioral;
But, while synthesizing it using Xilinx, the error comes up as
Line 53: Non-static loop limit exceeded
Any clues please?
P.S: Line 53 is - while (val > 0) loop
You need to learn about the difference between a
signal
andvariable
.When you assign to a
signal
you only schedule a change for the next point at which time moves on (in a clocked process like yours, this is when your process gets to the end, and all the other processes which are currently scheduled for execution have too).So when you write
val <= something
in a loop in the process,val
is only ever being scheduled to be updated. When the process inspects the value ofval
it sees the current value, not the scheduled one. You need to use a variable to keep track of things in this way.However, as noted elsewhere, if you just want to count the ones, it's much easier:
So, I'm going to ignore issues of things actually meeting timing (
val - 1
is expensive) and actually talk about your logic.Here's a piece of your code:
val
is a signal, not a variable. That means it will be updated when you finish the delta cycle. Which in this case, will be never. So you have an infinite loop.If you're just trying to calculate the popcount of a number, then why don't you just do this. Although I doubt this will meet timing as well (Probably need to break it up over multiple clock cycles).
And finally, most people would argue that algorithms designed for C code often perform poorly in hardware, because hardware has different capabilities than a fixed processor.