Globbing using braces on Ruby 1.9.3

2019-04-06 18:10发布

问题:

Recent versions of Ruby support the use of braces in globbing, if you use the File::FNM_EXTGLOB option

From the 2.2.0 documentation

File.fnmatch('c{at,ub}s', 'cats', File::FNM_EXTGLOB) #=> true  # { } is supported on FNM_EXTGLOB

However, the 1.9.3 documentation says it isn't supported in 1.9.3:

File.fnmatch('c{at,ub}s', 'cats')       #=> false # { } isn't supported

(also, trying to use File::FNM_EXTGLOB gave a name error)

Is there any way to glob using braces in Ruby 1.9.3, such as a third-party gem?

The strings I want to match against are from S3, not a local file system, so I can't just ask the operating system to do the globbing as far as I know.

回答1:

I'm in the process of packaging up a Ruby Backport for braces globbing support. Here are the essential parts of that solution:

module File::Constants
  FNM_EXTGLOB = 0x10
end

class << File
  def fnmatch_with_braces_glob(pattern, path, flags =0)
    regex = glob_convert(pattern, flags)

    return regex && path.match(regex).to_s == path
  end

  def fnmatch_with_braces_glob?(pattern, path, flags =0)
    return fnmatch_with_braces_glob(pattern, path, flags)
  end

private
  def glob_convert(pattern, flags)
    brace_exp = (flags & File::FNM_EXTGLOB) != 0
    pathnames = (flags & File::FNM_PATHNAME) != 0
    dot_match = (flags & File::FNM_DOTMATCH) != 0
    no_escape = (flags & File::FNM_NOESCAPE) != 0
    casefold = (flags & File::FNM_CASEFOLD) != 0
    syscase = (flags & File::FNM_SYSCASE) != 0
    special_chars = ".*?\\[\\]{},.+()|$^\\\\" + (pathnames ? "/" : "")
    special_chars_regex = Regexp.new("[#{special_chars}]")

    if pattern.length == 0 || !pattern.index(special_chars_regex)
      return Regexp.new(pattern, casefold || syscase ? Regexp::IGNORECASE : 0)
    end

    # Convert glob to regexp and escape regexp characters
    length = pattern.length
    start = 0
    brace_depth = 0
    new_pattern = ""
    char = "/"

    loop do
      path_start = !dot_match && char[-1] == "/"

      index = pattern.index(special_chars_regex, start)

      if index
        new_pattern += pattern[start...index] if index > start
        char = pattern[index]

        snippet = case char
        when "?"  then path_start ? (pathnames ? "[^./]" : "[^.]") : ( pathnames ? "[^/]" : ".")
        when "."  then "\\."
        when "{"  then (brace_exp && (brace_depth += 1) >= 1) ? "(?:" : "{"
        when "}"  then (brace_exp && (brace_depth -= 1) >= 0) ? ")" : "}"
        when ","  then (brace_exp && brace_depth >= 0) ? "|" : ","
        when "/"  then "/"
        when "\\"
          if !no_escape && index < length
            next_char = pattern[index += 1]
            special_chars.include?(next_char) ? "\\#{next_char}" : next_char
          else
            "\\\\"
          end
        when "*"
          if index+1 < length && pattern[index+1] == "*"
            char += "*"
            if pathnames && index+2 < length && pattern[index+2] == "/"
              char += "/"
              index += 2
              "(?:(?:#{path_start ? '[^.]' : ''}[^\/]*?\\#{File::SEPARATOR})(?:#{!dot_match ? '[^.]' : ''}[^\/]*?\\#{File::SEPARATOR})*?)?"
            else
              index += 1
              "(?:#{path_start ? '[^.]' : ''}(?:[^\\#{File::SEPARATOR}]*?\\#{File::SEPARATOR}?)*?)?"
            end
          else
            path_start ? (pathnames ? "(?:[^./][^/]*?)?" : "(?:[^.].*?)?") : (pathnames ? "[^/]*?" : ".*?")
          end
        when "["
          # Handle character set inclusion / exclusion
          start_index = index
          end_index = pattern.index(']', start_index+1)
          while end_index && pattern[end_index-1] == "\\"
            end_index = pattern.index(']', end_index+1)
          end
          if end_index
            index = end_index
            char_set = pattern[start_index..end_index]
            char_set.delete!('/') if pathnames
            char_set[1] = '^' if char_set[1] == '!'
            (char_set == "[]" || char_set == "[^]") ? "" : char_set
          else
            "\\["
          end
        else
          "\\#{char}"
        end

        new_pattern += snippet
      else
        if start < length
          snippet = pattern[start..-1]
          new_pattern += snippet
        end
      end

      break if !index
      start = index + 1
    end

    begin
      return Regexp.new("\\A#{new_pattern}\\z", casefold || syscase ? Regexp::IGNORECASE : 0)
    rescue
      return nil
    end
  end
end

This solution takes into account the various flags available for the File::fnmatch function, and uses the glob pattern to build a suitable Regexp to match the features. With this solution, these tests can be run successfully:

File.fnmatch('c{at,ub}s', 'cats', File::FNM_EXTGLOB)
#=> true
File.fnmatch('file{*.doc,*.pdf}', 'filename.doc')
#=> false
File.fnmatch('file{*.doc,*.pdf}', 'filename.doc', File::FNM_EXTGLOB)
#=> true
File.fnmatch('f*l?{[a-z].doc,[0-9].pdf}', 'filex.doc', File::FNM_EXTGLOB)
#=> true
File.fnmatch('**/.{pro,}f?l*', 'home/.profile', File::FNM_EXTGLOB | File::FNM_DOTMATCH)
#=> true

The fnmatch_with_braces_glob (and ? variant) will be patched in place of fnmatch, so that Ruby 2.0.0-compliant code will work with earlier Ruby versions, as well. For clarity reasons, the code shown above does not include some performance improvements, argument checking, or the Backports feature detection and patch-in code; these will obviously be included in the actual submission to the project.

I'm still testing some edge cases and heavily optimizing performance; it should be ready to submit very soon. Once it's available in an official Backports release, I'll update the status here.

Note that Dir::glob support will be coming at the same time, as well.



回答2:

That was a fun Ruby exercise! No idea if this solution is robust enough for you, but here goes :

class File
  class << self
    def fnmatch_extglob(pattern, path, flags=0)
      explode_extglob(pattern).any?{|exploded_pattern|
        fnmatch(exploded_pattern,path,flags)
      }
    end

    def explode_extglob(pattern)
      if match=pattern.match(/\{([^{}]+)}/) then
        subpatterns = match[1].split(',',-1)
        subpatterns.map{|subpattern| explode_extglob(match.pre_match+subpattern+match.post_match)}.flatten
      else
        [pattern]
      end
    end
  end
end

Better testing is needed, but it seems to work fine for simple cases :

[2] pry(main)> File.explode_extglob('c{at,ub}s')
=> ["cats", "cubs"]
[3] pry(main)> File.explode_extglob('c{at,ub}{s,}')
=> ["cats", "cat", "cubs", "cub"]
[4] pry(main)> File.explode_extglob('{a,b,c}{d,e,f}{g,h,i}')
=> ["adg", "adh", "adi", "aeg", "aeh", "aei", "afg", "afh", "afi", "bdg", "bdh", "bdi", "beg", "beh", "bei", "bfg", "bfh", "bfi", "cdg", "cdh", "cdi", "ceg", "ceh", "cei", "cfg", "cfh", "cfi"]
[5] pry(main)> File.explode_extglob('{a,b}c*')
=> ["ac*", "bc*"]
[6] pry(main)> File.fnmatch('c{at,ub}s', 'cats')
=> false
[7] pry(main)> File.fnmatch_extglob('c{at,ub}s', 'cats')
=> true
[8] pry(main)> File.fnmatch_extglob('c{at,ub}s*', 'catsssss')
=> true

Tested with Ruby 1.9.3 and Ruby 2.1.5 and 2.2.1.