use Task parallel library for I/O bound processing

2019-09-21 15:13发布

问题:

Wondering if you could clarify.

I am writing a tool that all has todo is retrieve data from a database (sql server) and create txt files. I am talking 500.000 txt files.

It's working and all is good.

However I was wondering if using Task Parallel library could improve and speed up the time it takes to create these files.

I know (read) that "TPL" is not meant to be used for I/0 bound processing and that most likely it will perform the same as sequential .

Is this true?

Also in an initial attempt using a simple "foreach parallel" I was getting an error cannot access file because is in use.

Any advice?

回答1:

You do not parallel I/O bound processes.

The reason is simple: because CPU is not the bottleneck. No matter you start how many threads, You only have ONE disk to write to, and that is the slowest thing.

So what you need to is to simply iterate every file and write them. You can start a seperate working thread doing this work, or using async I/O to get a better UI response.



回答2:

If you read and/or write from multiple disks, then parallizing could improve speed. E.g if you want to read all your files and run a hash on them and store the hash, then you could create one thread per disk and you would see a significant speed up. However, if your case it seems like tasks are unlikely to improve performance.