I am having trouble wrapping my head how to best replicate some C code in an FPGA using a for-loop (not my first time being stuck on this).
The snippet of C code look like this:
dot_product(&corr_sum, &sample_data_buffer[sample_index+d_circ_buf_size-sync_pattern_size], &sync_pattern[0], sync_pattern_size);
abs_corr_sum += abs(corr_sum);
Pretty straightforward, it is taking the dot product of two complex vectors and doing a cumulative sum of it.
And he was my attempt to replicate it:
always @(sample_index)
begin
// for each incoming sample
abs_corr_sum = 64'd0;
corr_sum = 64'd0;
for (index2 = 0; index2 < sync_pattern_size; index2 = index2 + 1'b1)
begin
corr_sum = sample_data_buffer_I[index2+sample_index+circ_buf_size-sync_pattern_size] * sync_pattern_I[index2]
+ sample_data_buffer_Q[index2+sample_index+circ_buf_size-sync_pattern_size] * sync_pattern_Q[index2];
//this is my quick and dirty abs(corr_sum) summer
abs_corr_sum = (corr_sum < 0) ? abs_corr_sum + ~$signed(corr_sum)+1 : abs_corr_sum + corr_sum;
end // for (index2 = 0; index2 < sync_pattern_size; index2 = index2 + 1'b1)
end //always @(sample_index)
Does this seem right? I am not getting the results I am expecting; and though the issue could be elsewhere, I think that this section is the most likely culprit.
To convert a piece of code coming from an algorithm with loops, conditionals, et al, into a synthesizable form of Verilog, you need to translate it in to a FSM.
For example, a for loop to do something similar you are asking for would be:
First, group sentences into time slots, so you can see which actions can be done in the same clock cycle (same state), and assign a state to each slot:
1)
2)
3)
4)
5)
States 2 and 3 may be merged into a single state, but that would force the synthesizer to infer two multipliers, and besides, the propagation delay of the resulting combinatorial path could be very high, limiting the clock frequency allowable for this design. So, I have split the dot product calculation into two parts, each one them using a single multiplication operation. The synthesizer, if instructed so, can use one multiplier and share it for the two operations, as both happen in different clock cycles.
which translates to this module: http://www.edaplayground.com/x/MEG
Signal
rst
is used to signal the module to start operation.finish
is raised by the module to signal end of operation and validness of output (abscorrsum
)sample_I
,sync_i
,sample_Q
andsync_Q
are modeled using memory blocks, withi
being the address of the element to read. Most synthesizers will infer block RAMs for these vectors, as each of them is read only in one state, and always with the same address signal.Which can be tested with this simple test bench: