I am trying to remove a large number of files from a location (by large I mean over 100000), whereby the action is initated from a web page. Obviously I could just use
string[] files = System.IO.Directory.GetFiles("path with files to delete");
foreach (var file in files) {
IO.File.Delete(file);
}
Directory.GetFiles http://msdn.microsoft.com/en-us/library/wz42302f.aspx
This method has already been posted a few times: How to delete all files and folders in a directory? and Delete files from directory if filename contains a certain word
But the problem with this method is that if you have say a hundred thousand files it becomes a performance issue as it has to generate all of the filepaths first before looping through them.
Added to this if a web page is waiting a response from a method which is performing this as you can imagine it will look a bit rubbish!
One thought I had was to wrap this up in an an asychrnonous web service call and when it completes it fires back a response to the web page to say that they have been removed? Maybe put the delete method in a separate thread? Or maybe even use a seperate batch process to perform the delete?
I have a similar issue when trying to count the number of files in a directory - if it contains a large number of files.
I was wondering if this is all a bit overkill? I.e. is there a simpler method to deal with this? Any help would be appreciated.
Can you put all your files in the same directory?
If so, why don't you just call
Directory.Delete(string,bool)
on the subdir you want to delete?If you've already got a list of file paths you want to get rid of, you might actually get better results by moving them to a temp dir then deleting them rather than deleting each file manually.
Cheers, Florian
Boot the work out to a worker thread and then return your response to the user.
I'd flag up a application variable to say that you are doing "the big delete job" to stop running multiple threads doing the same work. You could then poll another page which could give you a progress update of the number of files removed so far too if you wanted to?
Just a query but why so many files?
GetFiles
is extremely slow.Below an implementation of a fast Win32 wrapping for
GetFiles
, use it in combination with a new Thread and an AJAX function like:GetFilesUnmanaged(@"C:\myDir", "*.txt*).GetEnumerator().MoveNext()
.Usage
You could create a simple ajax webmethod in your aspx code behind and call it with javascript.
Do it in a separate thread, or post a message to a queue (maybe MSMQ?) where another application (maybe a windows service) is subscribed to that queue and performs the commands (i.e. "Delete e:\dir*.txt") in it's own process.
The message should probably just include the folder name. If you use something like NServiceBus and transactional queues, then you can post your message and return immediately as long as the message was posted successfully. If there is a problem actually processing the message, then it'll retry and eventually go on an error queue that you can watch and perform maintenance on.
Having more than 1000 files in a directory is a huge problem.
If you are in the development stages now, you should consider putting in an algo which will put the files into a random folder (inside your root folder) with a surety of the number of files in that folder to be under 1024.
Something like
While doing this, also make sure that each time you create a file, add it to a HashMap or list simultaneously (the path). Periodically serialize this using something like JSON.net to the filesystem(integrity’s sake, so that even if your service fails, you can get back the file list from the serialized form).
When you want to clean up the files or query among them, first do a lookup of this HashMap or list and then act on the file. This is better than
System.IO.Directory.GetFiles