I have a FileShare crawler (getting permissions and dropping them somewhere for later Audit). Currently it is starting multiple threads to crawl the same folder (to speed up the process).
In C#, each SqlConnection
object has its own SqlTransaction
, initiated by the SqlConnection.BeginTransaction()
call.
Here is the pseudo code of the current solution:
- Get list of folders
- For each folder get list of sub-folders
- For each sub folder start a thread to collect file shares
- Each thread will save collected data to database
- Run Audit reports on the database
The problem arise when one of the sub folders threads fails. We end up with partial folder scanning which "cannot be detected easily". The main reason is that each thread is running on a separate connection.
I would like to have each folder to be committed in the same transaction rather than having incomplete scanning (current situation, when some threads fail). No transaction concept is implemented but I am evaluating the options.
Based on the comments of this answer, the producer/consumer queue would be an option but unfortunately memory is a limit (due to the number of started threads). In case the producer/consumer space is committed to disk to overcome the RAM limit, the execution time will go up (due to the very limited disk I/O compared to memory I/O). I guess I am stuck with a memory/time compromise. Any other suggestions?