glob pattern matching in .NET

2020-01-26 05:17发布

Is there a built-in mechanism in .NET to match patterns other than Regular Expressions? I'd like to match using UNIX style (glob) wildcards (* = any number of any character).

I'd like to use this for a end-user facing control. I fear that permitting all RegEx capabilities will be very confusing.

标签: c# .net glob
14条回答
对你真心纯属浪费
2楼-- · 2020-01-26 05:52

I wrote a FileSelector class that does selection of files based on filenames. It also selects files based on time, size, and attributes. If you just want filename globbing then you express the name in forms like "*.txt" and similar. If you want the other parameters then you specify a boolean logic statement like "name = *.xls and ctime < 2009-01-01" - implying an .xls file created before January 1st 2009. You can also select based on the negative: "name != *.xls" means all files that are not xls.

Check it out. Open source. Liberal license. Free to use elsewhere.

查看更多
叼着烟拽天下
3楼-- · 2020-01-26 05:55

Just for the sake of completeness. Since 2016 in dotnet core there is a new nuget package called Microsoft.Extensions.FileSystemGlobbing that supports advanced globing paths. (Nuget Package)

some examples might be, searching for wildcard nested folder structures and files which is very common in web development scenarios.

  • wwwroot/app/**/*.module.js
  • wwwroot/app/**/*.js

This works somewhat similar with what .gitignore files use to determine which files to exclude from source control.

查看更多
smile是对你的礼貌
4楼-- · 2020-01-26 05:59

Just out of curiosity I've glanced into Microsoft.Extensions.FileSystemGlobbing - and it was dragging quite huge dependencies on quite many libraries - I've decided why I cannot try to write something similar?

Well - easy to say than done, I've quickly noticed that it was not so trivial function after all - for example "*.txt" should match for files only in current directly, while "**.txt" should also harvest sub folders.

Microsoft also tests some odd matching pattern sequences like "./*.txt" - I'm not sure who actually needs "./" kind of string - since they are removed anyway while processing. (https://github.com/aspnet/FileSystem/blob/dev/test/Microsoft.Extensions.FileSystemGlobbing.Tests/PatternMatchingTests.cs)

Anyway, I've coded my own function - and there will be two copies of it - one in svn (I might bugfix it later on) - and I'll copy one sample here as well for demo purposes. I recommend to copy paste from svn link.

SVN Link:

https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l800 (Search for matchFiles function if not jumped correctly).

And here is also local function copy:

/// <summary>
/// Matches files from folder _dir using glob file pattern.
/// In glob file pattern matching * reflects to any file or folder name, ** refers to any path (including sub-folders).
/// ? refers to any character.
/// 
/// There exists also 3-rd party library for performing similar matching - 'Microsoft.Extensions.FileSystemGlobbing'
/// but it was dragging a lot of dependencies, I've decided to survive without it.
/// </summary>
/// <returns>List of files matches your selection</returns>
static public String[] matchFiles( String _dir, String filePattern )
{
    if (filePattern.IndexOfAny(new char[] { '*', '?' }) == -1)      // Speed up matching, if no asterisk / widlcard, then it can be simply file path.
    {
        String path = Path.Combine(_dir, filePattern);
        if (File.Exists(path))
            return new String[] { filePattern };
        return new String[] { };
    }

    String dir = Path.GetFullPath(_dir);        // Make it absolute, just so we can extract relative path'es later on.
    String[] pattParts = filePattern.Replace("/", "\\").Split('\\');
    List<String> scanDirs = new List<string>();
    scanDirs.Add(dir);

    //
    //  By default glob pattern matching specifies "*" to any file / folder name, 
    //  which corresponds to any character except folder separator - in regex that's "[^\\]*"
    //  glob matching also allow double astrisk "**" which also recurses into subfolders. 
    //  We split here each part of match pattern and match it separately.
    //
    for (int iPatt = 0; iPatt < pattParts.Length; iPatt++)
    {
        bool bIsLast = iPatt == (pattParts.Length - 1);
        bool bRecurse = false;

        String regex1 = Regex.Escape(pattParts[iPatt]);         // Escape special regex control characters ("*" => "\*", "." => "\.")
        String pattern = Regex.Replace(regex1, @"\\\*(\\\*)?", delegate (Match m)
            {
                if (m.ToString().Length == 4)   // "**" => "\*\*" (escaped) - we need to recurse into sub-folders.
                {
                    bRecurse = true;
                    return ".*";
                }
                else
                    return @"[^\\]*";
            }).Replace(@"\?", ".");

        if (pattParts[iPatt] == "..")                           // Special kind of control, just to scan upper folder.
        {
            for (int i = 0; i < scanDirs.Count; i++)
                scanDirs[i] = scanDirs[i] + "\\..";

            continue;
        }

        Regex re = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
        int nScanItems = scanDirs.Count;
        for (int i = 0; i < nScanItems; i++)
        {
            String[] items;
            if (!bIsLast)
                items = Directory.GetDirectories(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);
            else
                items = Directory.GetFiles(scanDirs[i], "*", (bRecurse) ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly);

            foreach (String path in items)
            {
                String matchSubPath = path.Substring(scanDirs[i].Length + 1);
                if (re.Match(matchSubPath).Success)
                    scanDirs.Add(path);
            }
        }
        scanDirs.RemoveRange(0, nScanItems);    // Remove items what we have just scanned.
    } //for

    //  Make relative and return.
    return scanDirs.Select( x => x.Substring(dir.Length + 1) ).ToArray();
} //matchFiles

If you find any bugs, I'll be grad to fix them.

查看更多
We Are One
5楼-- · 2020-01-26 06:01

I don't know if the .NET framework has glob matching, but couldn't you replace the * with .*? and use regexes?

查看更多
Ridiculous、
6楼-- · 2020-01-26 06:02

From C# you can use .NET's LikeOperator.LikeString method. That's the backing implementation for VB's LIKE operator. It supports patterns using *, ?, #, [charlist], and [!charlist].

You can use the LikeString method from C# by adding a reference to the Microsoft.VisualBasic.dll assembly, which is included with every version of the .NET Framework. Then you invoke the LikeString method just like any other static .NET method:

using Microsoft.VisualBasic;
using Microsoft.VisualBasic.CompilerServices;
...
bool isMatch = LikeOperator.LikeString("I love .NET!", "I love *", CompareMethod.Text);
// isMatch should be true.
查看更多
▲ chillily
7楼-- · 2020-01-26 06:04

https://www.nuget.org/packages/Glob.cs

https://github.com/mganss/Glob.cs

A GNU Glob for .NET.

You can get rid of the package reference after installing and just compile the single Glob.cs source file.

And as it's an implementation of GNU Glob it's cross platform and cross language once you find another similar implementation enjoy!

查看更多
登录 后发表回答