Why in VHDL for-loop is not working?

I tried simulating a for loop in ModelSim, but it is not working properly. I know it cannot synthesized, but I don't know why it is not simulating properly.

When simulated, it is not showing any error. But final result is "00000001" (ie incremented only once) instead of the expected output of "00000011".

library ieee;
use ieee.std_logic_1164.all;
use IEEE.STD_LOGIC_unsigned.ALL;

entity forloop is 
port(   enable      : in std_logic;         
     data_out       : out std_logic_vector(7 downto 0)); 
end forloop;

architecture behaviour of forloop is

signal temp     : std_logic_vector(7 downto 0) := "00000000";

begin

process(enable)         
begin   

    for i in 0 to 3 loop            
        temp <= temp +1;
    end loop;
end process;        
data_out <= temp;       
end behaviour;

回答1:

The simulation output is correct. The for loop in VHDL expands to parallel assignments and not what you get in C (sequential assignments). This means that

for i in 0 to 3 loop            
    temp <= temp +1;
end loop;

will become

    temp <= temp +1;
    temp <= temp +1;
    temp <= temp +1;
    temp <= temp +1;

Note that temp is a signal and will be updated in the next cycle, hence the +1 is assigned to the same temp value for all 4 lines of code. Besides that, you don't use the enable signal in the sensitivity list. And are you sure you want to use an asynchronous process?

回答2:

IEEE Std 1076-2008, 10.10 Loop statement:

A loop statement includes a sequence of statements that is to be executed repeatedly, zero or more times.

This has the effect of replicating the temp assignment as dieli indicates in his answer.

    process (enable)
    begin
        for i in 0 to 3 loop
            temp <= temp + 1;
        end loop;
    end process;

Actually has two defects. You've noticed the first one.

First Defect

The first thing to note is that you are performing successive sequential signal assignments to the same signal with assignments located in the same process. Each process has only one set of drivers and process execution occurs until suspended in the same simulation cycle. Suspended by the implied wait statement as a last statement waiting on the sensitivity list.

These assignments occur as successive iterations of a loop and because there's only one projected waveform for any driver for any particular simulation time only the last signal assignment will actually occur. Essentially you're scheduling a signal update for each signal assignment and their projected value is being overwritten by the next until only the last signal assignment is left.

No signal assignment updates a value while any process is still executing or pending in the current simulation cycle.

A signal assignment with a waveform with an element without a delay will cause a delta cycle, and all pending signal assignments for the next queued simulation time will update after the current simulation cycle has completed and before the delta simulation cycle execution begins. Also see this answer - The VHDL Simulation Cycle for a big picture view of what happens during a simulation cycle.

The in depth description is easier to read in previous revisions of the IEEE standard, See IEEE Std 1076-2002/-2000/-1993, 12.6 Execution of a model. The -2008 standard description in 14.7 has had it's readability encumbered by additions to the standard for VHPI and force assignments.

There are several ways to overcome the effects of successive writes to a signal in the same process and occurring during the execution of the same simulation cycle.

Before that let's address the second defect.

Second Defect

Your sensitivity list contains one element, enable which triggers execution of the loop statement. This will occur for any transaction on enable, including to all four binary values represented by std_ulogic as well as it's meta-values. That includes a transition from a '1' to '0', a '0' to '1' or a combination with 'H' and 'L' which are treated identically in synthesis.

If you want a latch based counter you could qualify that with a particular value:

    process (enable)
    begin
        if enable = '1' then
            for i in 0 to 3 loop
                temp <= temp + 1;
            end loop;
        end if;
    end process;

This would cause the loop statement to only execute when enable transitioned to a '1'.

We typically don't use latch based counters, because as you might note this is edge triggered with only 'enable' in the sensitivity list and if you were to expand the sensitivity list for signals not used in the loop but elsewhere in the process your temp accumulator would be incremented by other events.

Note that you can't get a new value for temp while executing the process so it only increments once for each enable transaction as is, even with the loop statement. You effectively assign the new value of temp + 1 during the 3rd iteration of the i loop. and when execution of the process is done (as well as any other processes currently active) temp gets updated.

So how can we get temp incremented four times?

We could simply add 4 to it. That doesn't give us a general solution.

We could declare a process variable, assign it the value of temp when the process executes, and assign that to temp after executing the loop statement:

    process (enable)
        variable ltemp: std_logic_vector(7 downto 0);
    begin
        ltemp := temp;
        if enable = '1' then
            for i in 0 to 3 loop
                ltemp := ltemp + 1;
            end loop;
            temp <= ltemp;
        end if;
    end process;

This may not synthesis eligible depending on synthesis tool, because as dieli notes synthesis wants to treat the variable assignments as parallel operations, while we'd be counting on it to be the equivalent of:

            ltemp := ltemp + 1 + 1 + 1 + 1;

This requires a smarter synthesis tool capable of sequencing the additions.

Insuring Synthesis eligibility

We could over come this issue by using an array of ltemp values, something like this:

    process (enable, temp)
        type ltemp_array is array (0 to 3) of std_logic_vector(7 downto 0);
        variable ltemp: ltemp_array;
    begin
        if enable = '1' then
            for i in 0 to 3 loop
                if i = 0 then
                    ltemp(i) := temp + 1;
                else
                    ltemp(i) := ltemp(i-1) + 1;
                end if;
            end loop;
            temp <= ltemp(3);
        end if;
    end process;

Notice temp has been added to the sensitivity list because latches are transparent while their enable is TRUE.

The condition enable = '1' isn't of the form of \ given in the now withdrawn IEEE Std 1076.6-2004 for RTL synthesis as an edge sensitive event (all of which require a second term in the condition). Synthesis tools essentially ignore the sensitivity list, meaning we depend on rule c) of 1076.6 6.2.1.1 Level-sensitive storage from process with sensitivity list:

There are executions of the process that do not execute an explicit assignment (via an assignment statement) to the signal (or variable).

and the following:

The process sensitivity list shall contain all signals read within the process statement.

Which tells us temp should have been in the sensitivity list all along to make the process synthesis eligible. Incidentally also making the first two forms of the process shown in this answer synthesis ineligible.

This also reveals a third defect when intending to synthesize the process, that it has a feedback loop through temp. You'd actually notice the effect of adding temp to the sensitivity list and causing feedback during simulation of the process shown above. You'd either run into a delta cycle limit if present in your simulation tool, or the simulator would hang or could depend on some other implementation dependent mechanism to detect an invalid simulation.

And the way to get rid of that feedback loop is used an edge sensitive clock.

And that also brings up the idea of using separate processes (which can be inferred by say separate concurrent conditional signal assigments to ltemp elements promoted to signals when we can't perform all the additions in one clock cycle.