I have an application that we are developing using .NET 4.0 and EF 6.0. Premise of the program is quite simple. Watch a particular folder on the file system. As a new file gets dropped into this folder, look up information about this file in the SQL Server database (using EF), and then based on what is found, move the file to another folder on the file system. Once the file move is complete, go back to the DB and update the information about this file (Register File move).
These are large media files so it might take a while for each of them to move to the target location. Also, we might start this service with hundreds of these media files sitting in the source folder already that will need to be dispatched to the target location(s).
So to speed things up, I started out with using Task parallel library (async/await not available as this is .NET 4.0). For each file in the source folder, I look up info about it in the DB, determine which target folder it needs to move to, and then start a new task that begins to move the file…
LookupFileinfoinDB(filename)
{
// use EF DB Context to look up file in DB
}
// start a new task to begin the file move
var moveFileTask = Task<bool>.Factory.StartNew(
() =>
{
var success = false;
try
{
// the code to actually moves the file goes here…
.......
}
}
Now, once this task completes, I have to go back to the DB and update the info about the file. And that is where I am running into problems. (keep in mind that I might have several of these 'move file tasks'running in parallel and they will finish at different times. Currently, I am using task continuations to register the file move in the DB:
filemoveTask.ContinueWith(
t =>
{
if (t.IsCompleted && t.Result)
{
RegisterFileMoveinDB();
}
}
Problem is that I am using the same DB context for looking up the file info in the main task as well as inside the RegistetrFilemoveinDB() method later, that executes on the nested task. I was getting all kinds of weird exceptions thrown at me (mostly about SQL server Data reader etc.) when moving several files together. Online search for the answer revealed that the sharing of DB context among several tasks like I am doing here is a big no no as EF is not thread safe.
I would rather not create a new DB context for each file move as there could be dozens or even hundreds of them going at the same time. What would be a good alternative approach? Is there a way to 'signal' the main task when a nested task completes and finish the File move registration in the main task? Or am I approaching this problem in a wrong way all together and there is a better way to go about this?
Your best bet is to scope your
DbContext
for each thread.Parallel.ForEach
has overloads that are useful for this (the overloads withFunc<TLocal> initLocal
:You can call
SaveChanges()
within the body expression/RegisterFileMoveInDB if you prefer to have the DB updated ASAP. I would suggest tying the file system operations in with the DB transaction so that if the DB update fails, the file system operations are rolled back.