Considering a really huge file(maybe more than 4GB) on disk,I want to scan through this file and calculate the times of a specific binary pattern occurs.
My thought is:
Use memory-mapped file(CreateFileMap or boost mapped_file) to load the file to the virtual memory.
For each 100MB mapped-memory,create one thread to scan and calculate the result.
Is this feasible?Are there any better method to do so?
Update:
Memory-mapped file would be a good choice,for scaning through a 1.6GB file could be handled within 11s.
thanks.
Tim Bray (and his readers) explored this in depth in his Wide Finder Project and Wide Finder 2. Benchmark results show that multithreaded implementations can outperform a single-threaded solution on a massive Sun multicore server. On usual PC hardware, multithreading won't gain you that much, if at all.
Although you can use memory mapping, you don't have to. If you read the file sequentially in small chunks, say 1 MB each, the file will never be present in memory all at once.
If your search code is actually slower than your hard disk, you can still hand chunks off to worker threads if you like.