Is there a way to store a matrix of million bits o

I am working towards the implementation of a channel decoder on an FPGA. Esentially , the problem sums up to this :

1) I have a matrix . I do some computations on the rows. Then, I do some computations on the columns.

The decoder basically picks up each row of the matrix, performs some operations and move onto the next row. It does the same with the columns.

The decoder however operates on a 1023 * 1023 matrix i.e I have 1023 rows and 1023 columns.

Small test case that works : I first created a reg [1022:0] product_code[0:1] i.e 2 rows and 1023 columns. The output is as expected. However, the LUT utilization shows up to be 9 percent approximately. Then , I increase the size to 10 rows and 1023 columns(reg [1022:0] product_code[0:9]) which works as expected too. But the resource utilization has gone up to 27 percent.

Now my goal is to work get 1023 rows and 1023 columns. I does not even synthesize. Is there a better way to store such matrix on the FPGA ?

I would really appreciate any feedback !!!

You can find out the amount of storage an FPGA has from the manufacturers data sheet. However those memories are highly configurable.

Thus a 36 bit wide memory can be used as 36x1 or 18x2 or 4x9 units. Alternative you can read units of e.g. 36 bits but split the data yourself in 8 units of 4 bits. Process each nibble separately and write the whole back again.

Make sure your are using synchronous memories as all big memory blocks in all FPGAs are synchronous. If you start using asynchronous memories, the memories must be build from LUTS and you run out very quickly.

Also beware that your row and column processing must take into account how the data is stored. You can e.g. store the data row-wise. Using nibbles as example: when you read one 36 memory entry, that gives you a row of 8 nibbles. But in column mode one read gives you the first 8 entries of 8 adjacent columns. So there you should ideally process 8 columns in parallel at the same time.