Processing Folder With Multiple Files Using FileSy

2020-03-31 07:54发布

问题:

I have created a relatively simple windows app that watches a folder for files. When a new file is created into the folder the app (via FileSystemWatcher) will open the file and process the contents. Long story short, the contents are used with Selenium to automate a web page via IE11. This processing takes about 20 seconds per file.

The problem is if more than one file is created into the folder at roughly the same time or when the app is processing a file, FileSystemWatcher onCreated does not see the next file. So when processing completes on the first file the app just stops. Meanwhile there is a file in the folder that does not get processed. If a file is added after the onCreated processing is finished it works fine and processes that next file.

Can someone please guide me towards what I should be looking at to solve this? Excessive detail is very welcome.

回答1:

FileSystemWatcher (as you've already noticed) is not reliable ,you will always have to add a "custom"/manual logic for missing files (also,note that you might see more than one event for the same file)

Below you can see a simple example with a "background" check for unprocessed file.
You could avoid the locks by using concurrent collections e.g BlockingCollection
You could also chose to process your files in parallel
I'm processing the file based on a timer but you could use your own strategy.
If you don't want to process the file ,in real time ,probably you don't even need the FileSystemWatcher

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading;

namespace ConsoleAppDemo
{
    class Program
    {
        private static object lockIbj = new object();
        private static List<string> _proccessedFiles = new List<string>();
        private static readonly List<string> toProccessFiles = new List<string>();
        private static List<string> _proccessingFiles = new List<string>();
        private const string directory = @"C:\Path";
        private const string extension = @"*.txt";
        static void Main(string[] args)
        {
            FileSystemWatcher f = new FileSystemWatcher();
            f.Path = directory;
            f.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
                             | NotifyFilters.FileName | NotifyFilters.DirectoryName;
            f.Filter = extension ;
            f.Created += F_Created;
            f.EnableRaisingEvents = true;

            Timer manualWatcher = new Timer(ManuallWatcherCallback, null, 0, 3000);

            Timer manualTaskRunner = new Timer(ManuallRunnerCallback, null, 0, 10000);

            Console.ReadLine();
        }

        private static void F_Created(object sender, FileSystemEventArgs e)
        {
            lock (lockIbj)
            {
                toProccessFiles.Add(e.FullPath);
                Console.WriteLine("Adding new File from watcher: " + e.FullPath);
            }

        }

        private static void ManuallWatcherCallback(Object o)
        {
            var files = Directory.GetFiles(directory, extension);
            lock (lockIbj)
            {
                foreach (var file in files)
                {
                    if (!_proccessedFiles.Contains(file) && !toProccessFiles.Contains(file) && !_proccessingFiles.Contains(file))
                    {
                        toProccessFiles.Add(file);
                        Console.WriteLine("Adding new File from manuall timer: " + file);
                    }
                }

            }
        }

        private static bool processing;
        private static void ManuallRunnerCallback(Object o)
        {
            if (processing)
                return;

            while (true)
            {
                //you could proccess file in parallel
                string fileToProcces = null;

                lock (lockIbj)
                {
                    fileToProcces = toProccessFiles.FirstOrDefault();
                    if (fileToProcces != null)
                    {
                        processing = true;
                        toProccessFiles.Remove(fileToProcces);
                        _proccessingFiles.Add(fileToProcces);
                    }
                    else
                    {
                        processing = false;
                        break;


                    }
                }

                if (fileToProcces == null)
                    return;

                //Must add error handling
                ProccessFile(fileToProcces);
            }
        }

        private static void ProccessFile(string fileToProcces)
        {
            Console.WriteLine("Processing:" + fileToProcces);
            lock (lockIbj)
            {
                _proccessingFiles.Remove(fileToProcces);
                _proccessedFiles.Add(fileToProcces);
            }
        }
    }
}


回答2:

Instead of using FileSystemWatcher, you can use P/Invoke to run the Win32 File System Change Notification functions, and loop through the file system changes as they occur:

[DllImport("kernel32.dll", EntryPoint = "FindFirstChangeNotification")]
static extern System.IntPtr FindFirstChangeNotification (string lpPathName, bool bWatchSubtree, uint dwNotifyFilter);

[DllImport("kernel32.dll", EntryPoint = "FindNextChangeNotification")]
static extern bool FindNextChangeNotification (System.IntPtr hChangedHandle);

[DllImport("kernel32.dll", EntryPoint = "FindCloseChangeNotification")]
static extern bool FindCloseChangeNotification (System.IntPtr hChangedHandle);

[DllImport("kernel32.dll", EntryPoint = "WaitForSingleObject")]
static extern uint WaitForSingleObject (System.IntPtr handle, uint dwMilliseconds);

[DllImport("kernel32.dll", EntryPoint = "ReadDirectoryChangesW")]
static extern bool ReadDirectoryChangesW(System.IntPtr hDirectory, System.IntPtr lpBuffer, uint nBufferLength, bool bWatchSubtree, uint dwNotifyFilter, out uint lpBytesReturned, System.IntPtr lpOverlapped, ReadDirectoryChangesDelegate lpCompletionRoutine);

Basically, you call FindFirstChangeNotification with the directory you want to monitor, which gives you a Wait Handle. You then call WaitForSingleObject with the handle, and when it returns, you know that one or more changes has occurred. Then, you call ReadDirectoryChangesW to find out what has changed, and process the changes. Calling FindNextChangeNotification gives you a handle to wait on for the next change to the file system, so you'll likely call this, then call WaitForSingleObject, and then call ReadDirectoryChangesW in a loop. When you're done, you call FindCloseChangeNotification to stop tracking changes.

EDIT: Here is a more complete example:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;

[DllImport("kernel32.dll", EntryPoint = "FindFirstChangeNotification")]
static extern System.IntPtr FindFirstChangeNotification(string lpPathName, bool bWatchSubtree, uint dwNotifyFilter);

[DllImport("kernel32.dll", EntryPoint = "FindNextChangeNotification")]
static extern bool FindNextChangeNotification(System.IntPtr hChangedHandle);

[DllImport("kernel32.dll", EntryPoint = "FindCloseChangeNotification")]
static extern bool FindCloseChangeNotification(System.IntPtr hChangedHandle);

[DllImport("kernel32.dll", EntryPoint = "WaitForSingleObject")]
static extern uint WaitForSingleObject(System.IntPtr handle, uint dwMilliseconds);

[DllImport("kernel32.dll", EntryPoint = "ReadDirectoryChangesW")]
static extern bool ReadDirectoryChangesW(System.IntPtr hDirectory, System.IntPtr lpBuffer, uint nBufferLength, bool bWatchSubtree, uint dwNotifyFilter, out uint lpBytesReturned, System.IntPtr lpOverlapped, IntPtr lpCompletionRoutine);

[DllImport("kernel32.dll", EntryPoint = "CreateFile")]
public static extern IntPtr CreateFile(string lpFileName, uint dwDesiredAccess, uint dwShareMode, IntPtr SecurityAttributes, uint dwCreationDisposition, uint dwFlagsAndAttributes, IntPtr hTemplateFile);

enum FileSystemNotifications
{
    FileNameChanged = 0x00000001,
    DirectoryNameChanged = 0x00000002,
    FileAttributesChanged = 0x00000004,
    FileSizeChanged = 0x00000008,
    FileModified = 0x00000010,
    FileSecurityChanged = 0x00000100,
}

enum FileActions
{
    FileAdded = 0x00000001,
    FileRemoved = 0x00000002,
    FileModified = 0x00000003,
    FileRenamedOld = 0x00000004,
    FileRenamedNew = 0x00000005
}

enum FileEventType
{
    FileAdded,
    FileChanged,
    FileDeleted,
    FileRenamed
}

class FileEvent
{
    private readonly FileEventType eventType;
    private readonly FileInfo file;

    public FileEvent(string fileName, FileEventType eventType)
    {
        this.file = new FileInfo(fileName);
        this.eventType = eventType;
    }

    public FileEventType EventType => eventType;
    public FileInfo File => file;
}

[StructLayout(LayoutKind.Sequential)]
struct FileNotifyInformation
{
    public int NextEntryOffset;
    public int Action;
    public int FileNameLength;
    public IntPtr FileName;
}

class DirectoryWatcher
{
    private const int MaxChanges = 4096;
    private readonly DirectoryInfo directory;

    public DirectoryWatcher(string dirPath)
    {
        this.directory = new DirectoryInfo(dirPath);
    }

    public IEnumerable<FileEvent> Watch(bool watchSubFolders = false)
    {
        var directoryHandle = CreateFile(directory.FullName, 0x80000000, 0x00000007, IntPtr.Zero, 3, 0x02000000, IntPtr.Zero);    
        var fileCreatedDeletedOrUpdated = FileSystemNotifications.FileNameChanged | FileSystemNotifications.FileModified;
        var waitable = FindFirstChangeNotification(directory.FullName, watchSubFolders, (uint)fileCreatedDeletedOrUpdated);
        var notifySize = Marshal.SizeOf(typeof(FileNotifyInformation));

        do
        {
            WaitForSingleObject(waitable, 0xFFFFFFFF); // Infinite wait
            var changes = new FileNotifyInformation[MaxChanges];
            var pinnedArray = GCHandle.Alloc(changes, GCHandleType.Pinned);
            var buffer = pinnedArray.AddrOfPinnedObject();
            uint bytesReturned;            

            ReadDirectoryChangesW(directoryHandle, buffer, (uint)(notifySize * MaxChanges), watchSubFolders, (uint)fileCreatedDeletedOrUpdated, out bytesReturned, IntPtr.Zero, IntPtr.Zero);

            for (var i = 0; i < bytesReturned / notifySize; i += 1)
            {
                var change = Marshal.PtrToStructure<FileNotifyInformation>(new IntPtr(buffer.ToInt64() + i * notifySize));

                if ((change.Action & (int)FileActions.FileAdded) == (int)FileActions.FileAdded)
                {
                    yield return new FileEvent(Marshal.PtrToStringAuto(change.FileName, change.FileNameLength), FileEventType.FileAdded);
                }
                else if ((change.Action & (int)FileActions.FileModified) == (int)FileActions.FileModified)
                {
                    yield return new FileEvent(Marshal.PtrToStringAuto(change.FileName, change.FileNameLength), FileEventType.FileChanged);
                }
                else if ((change.Action & (int)FileActions.FileRemoved) == (int)FileActions.FileRemoved)
                {
                    yield return new FileEvent(Marshal.PtrToStringAuto(change.FileName, change.FileNameLength), FileEventType.FileDeleted);
                }
                else if ((change.Action & (int)FileActions.FileRenamedNew) == (int)FileActions.FileRenamedNew)
                {
                    yield return new FileEvent(Marshal.PtrToStringAuto(change.FileName, change.FileNameLength), FileEventType.FileRenamed);
                }
            }

            pinnedArray.Free();
        } while (FindNextChangeNotification(waitable));

        FindCloseChangeNotification(waitable);
    }
}

var watcher = new DirectoryWatcher(@"C:\Temp");

foreach (var change in watcher.Watch())
{
    Console.WriteLine("File {0} was {1}", change.File.Name, change.EventType);
}


回答3:

I have done this before but do not have the source code (previous job) and I ran into the same problem. I wound up creating a BackgroundWorker instance that would check the folder for new files. I would process the files, then archive them into a subfolder. Not sure if that's a possibility or not.

If moving the files is not an option, the BackgroundWorker might still be the answer. Track the LastModifiedDate or CreatedDate of the file and process any newer. In your onCreated, you would create an instance of BackgroundWorker and have it DoWork on the file. With your processing taking 20 seconds, I'm assuming you have all that logic directly called in the onCreated event logic. By taking it to another thread, you can do near instant processing and be done, while the other thread churns away until it's done.