Efficient (time and space complexity) data structu

2019-02-21 14:10发布

问题:

I have to read a file in which is stored a matrix with cars (1=BlueCar, 2=RedCar, 0=Empty).

I need to write an algorithm to move the cars of the matrix in that way:

  • blue ones move downward;
  • red ones move rightward;
  • there is a turn in which all the blue ones move and a turn to move all the red ones.

Before the file is read I don't know the matrix size and if it's dense or sparse, so I have to implement two data structures (one for dense and one for sparse) and two algorithms.

I need to reach the best time and space complexity possible.

Due to the unknown matrix size, I think to store the data on the heap.

If the matrix is dense, I think to use something like:

short int** M = new short int*[m];
short int*  M_data = new short int[m*n];

for(int i=0; i< m; ++i) 
{
    M[i] = M_data + i * n;
}

With this structure I can allocate a contiguous space of memory and it is also simple to be accessed with M[i][j].

Now the problem is the structure to choose for the sparse case, and I have to consider also how I can move the cars through the algorithm in the simplest way: for example when I evaluate a car, I need to find easily if in the next position (downward or rightward) there is another car or if it's empty.

Initially I thought to define BlueCar and RedCar objects that inherits from the general Car object. In this objects I can save the matrix coordinates and then put them in:

std::vector<BluCar> sparseBlu;
std::vector<RedCar> sparseRed;

Otherwise I can do something like:

vector< tuple< row, column, value >> sparseMatrix

But the problem of finding what's in the next position still remains.

Probably this is not the best way to do it, so how can I implement the sparse case in a efficient way? (also using a unique structure for sparse)

回答1:

Why not simply create a memory mapping directly over the file? (assuming your data 0,1,2 is stored in contiguous bytes (or bits) in the file, and the position of those bytes also represents the coordinates of the cars)

This way you don't need to allocate extra memory and read in all the data, and the data can simply and efficiently be accessed with M[i][j].

Going over the rows would be L1-cache friendly.

In case of very sparse data, you could scan through the data once and keep a list of the empty regions/blocks in memory (only need to store startpos and size), which you could then skip (and adjust where needed) in further runs.

With memory mapping, only frequently accessed pages are kept in memory. This means that once you have scanned for the empty regions, memory will only be allocated for the frequently accessed non-empty regions (all this will be done automagically by the kernel - no need to keep track of it yourself).

Another benefit is that you are accessing the OS disk cache directly. Thus no need to keep copying and moving data between kernel space and user space.

To further optimize space- and memory usage, the cars could be stored in 2 bits in the file.

Update:

I'll have to move cars with openMP and MPI... Will the memory mapping work also with concurrent threads?

You could certainly use multithreading, but not sure if openMP would be the best solution here, because if you work on different parts of the data at the same time, you may need to check some overlapping regions (i.e. a car could move from one block to another).

Or you could let the threads work on the middle parts of the blocks, and then start other threads to do the boundaries (with red cars that would be one byte, with blue cars a full row).

You would also need a locking mechanism for adjusting the list of the sparse regions. I think the best way would be to launch separate threads (depending on the size of the data of course).



回答2:

In a somewhat similar task, I simply made use of Compressed Row Storage.

The Compressed Row and Column (in the next section) Storage formats are the most general: they make absolutely no assumptions about the sparsity structure of the matrix, and they don't store any unnecessary elements. On the other hand, they are not very efficient, needing an indirect addressing step for every single scalar operation in a matrix-vector product or preconditioner solve.

You will need to be a bit more specific about time and space complexity requirements. CSR requires an extra indexing step for simple operations, but that is a minor amount of overhead if you're just doing simple matrix operations.

There's already an existing C++ implementation available online as well.