How do I compare one collection of files to anothe

2019-06-12 20:30发布

问题:

I am just learning C# (have been fiddling with it for about 2 days now) and I've decided that, for leaning purposes, I will rebuild an old app I made in VB6 for syncing files (generally across a network).

When I wrote the code in VB 6, it worked approximately like this:

  1. Create a Scripting.FileSystemObject
  2. Create directory objects for the source and destination
  3. Create file listing objects for the source and destination
  4. Iterate through the source object, and check to see if it exists in the destination
    • if not, create it
    • if so, check to see if the source version is newer/larger, and if so, overwrite the other

So far, this is what I have:

private bool syncFiles(string sourcePath, string destPath) {
    DirectoryInfo source = new DirectoryInfo(sourcePath);
    DirectoryInfo dest = new DirectoryInfo(destPath);

    if (!source.Exists) {
        LogLine("Source Folder Not Found!");
        return false;
    }

    if (!dest.Exists) {
        LogLine("Destination Folder Not Found!");
        return false;
    }

    FileInfo[] sourceFiles = source.GetFiles();
    FileInfo[] destFiles = dest.GetFiles();

    foreach (FileInfo file in sourceFiles) {
        // check exists on file
    }

    if (optRecursive.Checked) {
        foreach (DirectoryInfo subDir in source.GetDirectories()) {
            // create-if-not-exists destination subdirectory
            syncFiles(sourcePath + subDir.Name, destPath + subDir.Name);
        }
    }
    return true;
}

I have read examples that seem to advocate using the FileInfo or DirectoryInfo objects to do checks with the "Exists" property, but I am specifically looking for a way to search an existing collection/list of files, and not live checks to the file system for each file, since I will be doing so across the network and constantly going back to a multi-thousand-file directory is slow slow slow.

Thanks in Advance.

回答1:

The GetFiles() method will only get you files that does exist. It doesn't make up random files that doesn't exist. So all you have to do is to check if it exists in the other list.

Something in the lines of this could work:

var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();

foreach (var file in sourceFiles)
{
    if(!destFiles.Any(x => x.Name == file.Name))
    {
        // Do whatever
    }
}

Note: You have of course no guarantee that something hasn't changed after you have done the calls to GetFiles(). For example, a file could have been deleted or renamed if you try to copy it later.


Could perhaps be done nicer somehow by using the Except method or something similar. For example something like this:

var sourceFiles = source.GetFiles();
var destFiles = dest.GetFiles();

var sourceFilesMissingInDestination = sourceFiles.Except(destFiles, new FileNameComparer());

foreach (var file in sourceFilesMissingInDestination)
{
    // Do whatever
}

Where the FileNameComparer is implemented like so:

public class FileNameComparer : IEqualityComparer<FileInfo>
{
    public bool Equals(FileInfo x, FileInfo y)
    {
        return Equals(x.Name, y.Name);
    }


    public int GetHashCode(FileInfo obj)
    {
        return obj.Name.GetHashCode();
    }
}     

Untested though :p



回答2:

One little detail, instead of

 sourcePath + subDir.Name

I would use

 System.IO.Path.Combine(sourcePath, subDir.Name)

Path does reliable, OS independent operations on file- and foldernames.

Also I notice optRecursive.Checked popping out of nowhere. As a matter of good design, make that a parameter:

bool syncFiles(string sourcePath, string destPath, bool checkRecursive)

And since you mention it may be used for large numbers of files, keep an eye out for .NET 4, it has an IEnumerable replacement for GetFiles() that will let you process this in a streaming fashion.