Given the following setup in TPL dataflow.
var directory = new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles");
var dirBroadcast=new BroadcastBlock<DirectoryInfo>(dir=>dir);
var dirfinder = new TransformManyBlock<DirectoryInfo, DirectoryInfo>((dir) =>
{
return directory.GetDirectories();
});
var tileFilder = new TransformManyBlock<DirectoryInfo, FileInfo>((dir) =>
{
return directory.GetFiles();
});
dirBroadcast.LinkTo(dirfinder);
dirBroadcast.LinkTo(tileFilder);
dirfinder.LinkTo(dirBroadcast);
var block = new XYZTileCombinerBlock<FileInfo>(3, (file) =>
{
var coordinate = file.FullName.Split('\\').Reverse().Take(3).Reverse().Select(s => int.Parse(Path.GetFileNameWithoutExtension(s))).ToArray();
return XYZTileCombinerBlock<CloudBlockBlob>.TileXYToQuadKey(coordinate[0], coordinate[1], coordinate[2]);
},
(quad) =>
XYZTileCombinerBlock<FileInfo>.QuadKeyToTileXY(quad,
(z, x, y) => new FileInfo(Path.Combine(directory.FullName,string.Format("{0}/{1}/{2}.png", z, x, y)))),
() => new TransformBlock<string, string>((s) =>
{
Trace.TraceInformation("Combining {0}", s);
return s;
}));
tileFilder.LinkTo(block);
using (new TraceTimer("Time"))
{
dirBroadcast.Post(directory);
block.LinkTo(new ActionBlock<FileInfo>((s) =>
{
Trace.TraceInformation("Done combining : {0}", s.Name);
}));
block.Complete();
block.Completion.Wait();
}
i am wondering how I can mark this to complete because of the cycle. A directory is posted to the dirBroadcast broadcaster which posts to the dirfinder that might post back new dirs to the broadcaster, so i cant simply mark it as complete because it would block any directories being added from the dirfinder. Should i redesign it to keep track of the number of dirs or is there anything for this in TPL.
Just to show my real answer, a combination of TPL and Rx.
where block is my
var block = new XYZTileCombinerBlock<FileInfo>
I don't see any way this can be done, because each block (
dirBroadcast
andtileFilder
) depends on the other one and can't complete on its own.I suggest you redesign your directory traversal without TPL Dataflow, which isn't a good fit for this kind of problem. A better approach in my opinion would simply be to recursively scan the directories and fill your
block
with a stream of files:I am sure this is not always possible, but in many cases (including directory enumeration) you can use a running counter and the
Interlocked
functions to have a cyclic one-to-many dataflow that completes:I have used this with a slight modification to enumerate files, but it works well. Be careful with the max degree of parallelism, this can quickly saturate a network file system!
If the purpose of your code is to traverse the directory structure using some sort of parallelism then I would suggest not using TPL Dataflow and use Microsoft's Reactive Framework instead. I think it becomes much simpler.
Here's how I would do it.
First define a recursive function to build the list of directories:
This performs the recurse of the directories and uses the default Rx scheduler which causes the observable to run in parallel.
So by calling
recurse
with an inputDirectoryInfo
I get an observable list of the input directory and all of its descendants.Now I can build a fairly straight-forward query to get the results I want:
Now I can action the query like this:
Now I may have missed a little bit in your custom code but if this is an approach you want to take I'm sure you can fix any logical issues quite easily.
This code automatically handles completion when it runs out of child directories and files.
To add Rx to your project look for "Rx-Main" in NuGet.