Fastest file reading in a multi-threaded applicati

2019-04-24 10:52发布

问题:

I have to read a 8192x8192 matrix into memory. I want to do it as fast as possible.
Right now I have this structure:

char inputFile[8192][8192*4]; // I know the numbers are at max 3 digits
int8_t matrix[8192][8192]; // Matrix to be populated

// Read entire file line by line using fgets
while (fgets (inputFile[lineNum++], MAXCOLS, fp));

//Populate the matrix in parallel, 
for (t = 0; t < NUM_THREADS; t++){
    pthread_create(&threads[t], NULL, ParallelRead, (void *)t);
}

In the function ParallelRead, I parse each line, do atoi and populate the matrix. The parallelism is line-wise like thread t parses line t, t+ 1 * NUM_THREADS..

On a two-core system with 2 threads, this takes

Loading big file (fgets) : 5.79126
Preprocessing data (Parallel Read) : 4.44083

Is there a way to optimize this any further?

回答1:

It's a bad idea to do it this way. Threads can get your more cpu cycles if you have enough cores but you still have only one hard disk. So inevitably threads cannot improve the speed of reading file data.

They actually make it much worse. Reading data from a file is fastest when you access the file sequentially. That minimizes the number of reader head seeks, by far the most expensive operation on a disk drive. By splitting the reading across multiple threads, each reading a different part of the file, you are making the reader head constantly jump back and forth. Very, very bad for throughput.

Use only one thread to read file data. You might be able to overlap it with some computational cycles on the file data by starting a thread once a chunk of the file data is loaded.

Do watch out for the test effect. When you re-run your program, typically after tweaking your code somewhat, it is likely that the program can find file data back in the file system cache so it doesn't have to be read from the disk. That's very fast, memory bus speed, a memory-to-memory copy. Pretty likely on your dataset since it isn't very big and easily fits in the amount of RAM a modern machine has. This does not (typically) happen on a production machine. So be sure to clear out the cache to get realistic numbers, whatever it takes on your OS.



回答2:

One thing worth considering is allocating two smaller input buffers (say they will be 200 lines each).

Then have one thread read data into the input buffers. When one input buffer is full, pass it to a second thread that does the parsing. This second thread could use a thread pool for concurrent parsing (check openMP).

You will have to use locks/mutexes to ensure that either thread has exclusive access.

This would be better because the parsing now is concurrent with reading the file, and you memory access to the buffer is more local and will fit into your CPU cache. This can improve reading and parsing speed.

If fgets is the bottleneck, you can also read the file into memory as binary. This could improve read speed, but will require you to do extra parsing and will make the abovementioned optimization harder to carry out.



回答3:

Try a parent thread that loads the character array using something like fread to load everything in 1 io as a great big string.

Have the parent walk the string, and find 1 line, or calculate where the first line is based on sizes. Hand the processing of that line off to a thread. Next Line, Rinse, Repeat, until EOF. Sync with the threads. Done.



回答4:

The best performance you can get with file I/O is via memory mapping. This is an example. I would start from a single threaded design and if post-load processing proves to be a bottleneck make it parallel.