PHP filtering files and paths according .gitignore

2020-07-18 04:43发布

问题:

I want to use PHP to read all files and paths ignored by .gitignore configuration. Just like how git does.

It's possible to read directory repeatedly and use regular expression for each file to filter. But it`s so ineffective if the path have too much files.

Any good and most effective way to read target files and path ignored by .gitignore?

回答1:

You need to proceed in several steps:

1 - Find the .gitignore files

Each folder can have one, so don't assume there's a single one.

And submodules have a .git link to the main .git folder, so be wary about stopping too early as well.

It'll go something like:

function find_gitignore_files($dir) {
  $files = array();
  while (true) {
    $file = "$dir/.gitignore";
    if (is_file($file)) $files[] = $file;
    if (is_dir("$dir/.git") && !is_link("$dir/.git")) break;  # stop here
    if (dirname($dir) === '.') break;                         # and here
    $dir = dirname($dir);
  }
  return $files;
}

2 - Parse each .gitignore file

You need to ignore comments, mind the negation operator (!), and mind the globs.

This one is, give or take, is going to go something like:

function parse_git_ignore_file($file) { # $file = '/absolute/path/to/.gitignore'
  $dir = dirname($file);
  $matches = array();
  $lines = file($file);
  foreach ($lines as $line) {
    $line = trim($line);
    if ($line === '') continue;                 # empty line
    if (substr($line, 0, 1) == '#') continue;   # a comment
    if (substr($line, 0, 1) == '!') {           # negated glob
      $line = substr($line, 1);
      $files = array_diff(glob("$dir/*"), glob("$dir/$line"));
    } else {                                    # normal glob
      $files = glob("$dir/$line");
    }
    $matches = array_merge($matches, $files);
  }
  return $matches;
}

(Note: none of the above is tested, but they should put you in the right direction.)



回答2:

Just a crazy idea: if you rely on Git to give you the patterns for ignored files why not rely on it to give the list of included/ignored files? Just issue a command like:

  • git ls-files for all tracked files
  • git clean -ndX or git ls-files -i --exclude-from=[Path_To_Your_Global].gitignore for all ignored files

See which Git command gives you the best output and then loop through the path files.

And a word of caution: take all the necessary precaution measures needed when executing external commands!

Sources:

  • Show ignored files in git
  • List files in local git repo?


回答3:

I use this function to read the Whole path, it works good

function read_dir($dir)
    {
        $files = array();
        $dir = preg_replace('~\/+~','/',$dir . '/');
        $all  = scandir($dir);
        foreach($all as $path):
            if($path !== '.' && $path !== '..'):
                $path = $dir . '/' . $path;
                $path = preg_replace('~\/+~','/',$path);
                $path = realpath($path);
                if(is_dir($path)):
                    $files = array_merge($files, read_dir($path));
                endif;
                $files[] = preg_replace('~/+~i','/',$path);
            endif;
        endforeach;
        return $files;
}

UPDATE: You Can Use preg_grep over the above function as follow

$files = preg_grep('~\.gitignore\b~i', array_values(read_dir($path)));


回答4:

entries in a .gitignore are mostly glob patterns. you can read each line of your .gitignore using php's file function, ignore empty lines and lines that start with # and then read the patterns using the php glob function (http://php.net/manual/en/function.glob.php)



回答5:

You can get an array of files to ignore from a .gitignore file and check against that. To do that, you would need to read the file and match files using the glob function.

First, get the contents of the file:

$contents = file_get_contents($pathToGitIgnoreFile);
$path = dirname(realpath($pathToGitIgnoreFile));

You can also use the directory of the .gitignore file to match files in the same directory as the gitignore.

Next, we need to split the contents into individual rules. Rules start on their own line in the file. Lines that start with the pound symbol (#) are comments, so we can just use a regular expression to find non-blank lines that aren't comments:

$rules = array();
preg_match_all('/[\\A\\v]([^#\\v]\\V*)[\\z\\v]?/', $contents, $rules);
$rules = $rules[1];

Then all you have to do is iterate through the rules and use glob to create an array of file names to ignore:

$files = array();
foreach ($rules as $rule)
{
    if (strpos($rule, '!') === 0) // negative rule
        $files = array_diff($files, glob($path . DIRECTORY_SEPARATOR . substr($rule, 1)));
    else
        $files = array_merge($files, glob($path . DIRECTORY_SEPARATOR . $rule));
}
$files = array_unique($files);

I didn't test this code, so comment below if it doesn't work for you.



回答6:

The SPL (Standard PHP Library) contains some iterators for that job. I am limiting the example to filter out all directories or files that start with an "." in their name.

The rules for .gitignore are quite complex, parsing the entries and building a set of rules would go way beyond the scope of an example.

$directory = __DIR__;

$filtered = new RecursiveIteratorIterator(
  new RecursiveCallbackFilterIterator(
    new RecursiveDirectoryIterator($directory),
    function ($fileInfo, $key, $iterator) {
      // only accept entries that do not start with an . 
      return substr($fileInfo->getFilename(), 0, 1) != '.';
    }
  )
);


foreach ($filtered as $fileInfo) {
  echo (string)$fileInfo, "\n";
}