Iterate directories in Perl, getting introspectabl

2019-07-26 06:41发布

I'm about to start a script that may have some file lookups and manipulation, so I thought I'd look into some packages that would assist me; mostly, I'd like the results of the iteration (or search) to be returned as objects, which would have (base)name, path, file size, uid, modification time, etc as some sort of properties.

The thing is, I don't do this all that often, and tend to forget APIs; when that happens, I'd rather let the code run on an example directory, and dump all of the properties in an object, so I can remind myself what is available where (obviously, I'd like to "dump", in order to avoid having to code custom printouts). However, I'm aware of the following:

list out all methods of object - perlmonks.org
"Out of the box Perl doesn't do object introspection. Class wrappers like Moose provide introspection as part of their implementation, but Perl's built in object support is much more primitive than that."

Anyways, I looked into:

... and started looking into the libraries referred there (also related link: rjbs's rubric: the speed of Perl file finders).

So, for one, File::Find::Object seems to work for me; this snippet:

use Data::Dumper;
@targetDirsToScan = ("./");

use File::Find::Object;
my $tree = File::Find::Object->new({}, @targetDirsToScan);
while (my $robh = $tree->next_obj()) {
  #print $robh ."\n"; # prints File::Find::Object::Result=HASH(0xa146a58)}
  print Dumper($robh) ."\n";
}

... prints this:

# $VAR1 = bless( {
#                  'stat_ret' => [
#                                  2054,
#                                  429937,
#                                  16877,
#                                  5,
#                                  1000,
#                                  1000,
#                                  0,
#                                  '4096',
#                                  1405194147,
#                                  1405194139,
#                                  1405194139,
#                                  4096,
#                                  8
#                                ],
#                  'base' => '.',
#                  'is_link' => '',
#                  'is_dir' => 1,
#                  'path' => '.',
#                  'dir_components' => [],
#                  'is_file' => ''
#                }, 'File::Find::Object::Result' );
# $VAR1 = bless( {
#                  'base' => '.',
#                  'is_link' => '',
#                  'is_dir' => '',
#                  'path' => './test.blg',
#                  'is_file' => 1,
#                  'stat_ret' => [
#                                  2054,
#                                  423870,
#                                  33188,
#                                  1,
#                                  1000,
#                                  1000,
#                                  0,
#                                  '358',
#                                  1404972637,
#                                  1394828707,
#                                  1394828707,
#                                  4096,
#                                  8
#                                ],
#                  'basename' => 'test.blg',
#                  'dir_components' => []

... which is mostly what I wanted, except the stat results are an array, and I'd have to know its layout (($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) stat - perldoc.perl.org) to make sense of the printout.

Then I looked into IO::All, which I like because of utf-8 handling (but also, say, socket functionality, which would be useful to me for an unrelated task in the same script); and I was thinking I'd use this package instead. The problem is, I have a very hard time discovering what the available fields in the object returned are; e.g. with this code:

use Data::Dumper;
@targetDirsToScan = ("./");

use IO::All -utf8;
$io = io(@targetDirsToScan);
@contents = $io->all(0);
for my $contentry ( @contents ) {
  #print Dumper($contentry) ."\n"; 
  # $VAR1 = bless( \*Symbol::GEN298, 'IO::All::File' );
  # $VAR1 = bless( \*Symbol::GEN307, 'IO::All::Dir' ); ...
  #print $contentry->uid . " -/- " . $contentry->mtime . "\n";
  # https://stackoverflow.com/q/24717210/printing-ret-of-ioall-w-datadumper
  print Dumper \%{*$contentry}; # doesn't list uid
}

... I get a printout like this:

# $VAR1 = {
#           '_utf8' => 1,
#           'constructor' => sub { "DUMMY" },
#           'is_open' => 0,
#           'io_handle' => undef,
#           'name' => './test.blg',
#           '_encoding' => 'utf8',
#           'package' => 'IO::All'
#         };
# $VAR1 = {
#           '_utf8' => 1,
#           'constructor' => sub { "DUMMY" },
#           'mode' => undef,
#           'name' => './testdir',
#           'package' => 'IO::All',
#           'is_absolute' => 0,
#           'io_handle' => undef,
#           'is_open' => 0,
#           '_assert' => 0,
#           '_encoding' => 'utf8'

... which clearly doesn't show attributes like mtime, etc. - even if they exist (which you can see if you uncomment the respective print line).

I've also tried Data::Printer's (How can I perform introspection in Perl?) p() function - it prints exactly the same fields as Dumper. I also tried to use print Dumper \%{ref ($contentry) . "::"}; (list out all methods of object - perlmonks.org), and this prints stuff like:

'O_SEQUENTIAL' => *IO::All::File::O_SEQUENTIAL,
'mtime' => *IO::All::File::mtime,
'DESTROY' => *IO::All::File::DESTROY,
...
'deep' => *IO::All::Dir::deep,
'uid' => *IO::All::Dir::uid,
'name' => *IO::All::Dir::name,
...

... but only if you use the print $contentry->uid ... line beforehand; else they are not listed! I guess that relates to this:

introspection - How do I list available methods on a given object or package in Perl? #911294
In general, you can't do this with a dynamic language like Perl. The package might define some methods that you can find, but it can also make up methods on the fly that don't have definitions until you use them. Additionally, even calling a method (that works) might not define it. That's the sort of things that make dynamic languages nice. :)

Still, that prints the name and type of the field - I'd want the name and value of the field instead.

So, I guess my main question is - how can I dump an IO::All result, so that all fields (including stat ones) are printed out with their names and values (as is mostly the case with File::Find::Object)?

(I noticed the IO::All results can be of type, say, IO::All::File, but its docs defer to "See IO::All", which doesn't discuss IO::All::File explicitly much at all. I thought, if I could "cast" \%{*$contentry} to a IO::All::File, maybe then mtime etc fields will be printed - but is such a "cast" possible at all?)

If that is problematic, are there other packages, that would allow introspective printout of directory iteration results - but with named fields for individual stat properties?

3条回答
Summer. ? 凉城
2楼-- · 2019-07-26 07:04

Ok, this is more-less as an exercise (and reminder for me); below is some code, where I've tried to define a class (File::Find::Object::StatObj) with accessor fields for all of the stat fields. Then, I have the hack for IO::All::File from Replacing a class in Perl ("overriding"/"extending" a class with same name)?, where a mtimef field is added which corresponds to mtime, just as a reminder.

Then, just to see what sort of interface I could have between the two libraries, I have IO::All doing the iterating; and the current file path is passed to File::Find::Object, from which we obtain a File::Find::Object::Result - which has been "hacked" to also show the File::Find::Object::StatObj; but that one is only generated after a call to the hacked Result's full_components (that might as well have been a separate function). Notice that in this case, you won't get full_components/dir_components of File::Find::Object::Result -- because apparently it is not File::Find::Object doing the traversal here, but IO::All. Anyways, the result is something like this:

#  $VAR1 = {
#            '_utf8' => 1,
#            'mtimef' => 1403956165,
#            'constructor' => sub { "DUMMY" },
#            'is_open' => 0,
#            'io_handle' => undef,
#            'name' => 'img/test.png',
#            '_encoding' => 'utf8',
#            'package' => 'IO::All'
#          };
#  img/test.png
# >  - $VAR1 = bless( {
#                   'base' => 'img/test.png',
#                   'is_link' => '',
#                   'is_dir' => '',
#                   'path' => 'img/test.png',
#                   'is_file' => 1,
#                   'stat_ret' => [
#                                   2054,
#                                   426287,
#                                   33188,
#                                   1,
#                                   1000,
#                                   1000,
#                                   0,
#                                   '37242',
#                                   1405023944,
#                                   1403956165,
#                                   1403956165,
#                                   4096,
#                                   80
#                                 ],
#                   'basename' => undef,
#                   'stat_obj' => bless( {
#                                          'blksize' => 4096,
#                                          'ctime' => 1403956165,
#                                          'rdev' => 0,
#                                          'blocks' => 80,
#                                          'uid' => 1000,
#                                          'dev' => 2054,
#                                          'mtime' => 1403956165,
#                                          'mode' => 33188,
#                                          'size' => '37242',
#                                          'nlink' => 1,
#                                          'atime' => 1405023944,
#                                          'ino' => 426287,
#                                          'gid' => 1000
#                                        }, 'File::Find::Object::StatObj' ),
#                   'dir_components' => []
#                 }, 'File::Find::Object::Result' );

I'm not sure how correct this would be, but what I like about this is that I could forget where the fields are; then I could rerun the dumper, and see that I could get mtime via (*::Result)->stat_obj->size - and that seems to work (here I'd need just to read these, not to set them).

Anyways, here is the code:

use Data::Dumper;
my @targetDirsToScan = ("./");

use IO::All -utf8 ;                          # Turn on utf8 for all io

# try to "replace" the IO::All::File class
{ # https://stackoverflow.com/a/24726797/277826
  package IO::All::File;
  use IO::All::File; # -base; # just do not use `-base` here?!

  # hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
  # NB: field is a sub in /usr/local/share/perl/5.10.1/IO/All/Base.pm
  field mtimef => undef; # hack

  sub file {
    my $self = shift;
    bless $self, __PACKAGE__;
    $self->name(shift) if @_;
    $self->mtimef($self->mtime); # hack
    #print("!! *haxx0rz'd* file() reporting in\n");
    return $self->_init;
  }

  1;
}

use File::Find::Object;
# based on /usr/local/share/perl/5.10.1/File/Find/Object/Result.pm;
# but inst. from /usr/local/share/perl/5.10.1/File/Find/Object.pm
{
  package File::Find::Object::StatObj;
  use integer;
  use Tie::IxHash;
  #use Data::Dumper;
  sub ordered_hash { # https://stackoverflow.com/a/3001400/277826
    #my (@ar) = @_; #print("# ". join(",",@ar) . "\n");
    tie my %hash => 'Tie::IxHash';
    %hash = @_; #print Dumper(\%hash);
    \%hash
  }
  my $fields = ordered_hash(
        # from http://perldoc.perl.org/functions/stat.html
        (map { $_ => $_ } (qw(
        dev ino mode nlink uid gid rdev size
        atime mtime ctime blksize blocks
        )))
      ); #print Dumper(\%{$fields});
  use Class::XSAccessor
      #accessors => %{$fields}, # cannot - is seemingly late
      # ordered_hash gets accepted, but doesn't matter in final dump;
      #accessors => { (map { $_ => $_ } (qw(
      accessors => ordered_hash( (map { $_ => $_ } (qw(
        dev ino mode nlink uid gid rdev size
        atime mtime ctime blksize blocks
      ))) ),
      #))) },
      ;
  use Fcntl qw(:mode);
  sub new
  {
    #my $self = shift;
    my $class = shift;
    my @stat_arr = @_; # the rest
    my $ic = 0;
    my $self = {};
    bless $self, $class;
    for my $k (keys %{$fields}) {
      $fld = $fields->{$k};
      #print "$ic '$k' '$fld' ".join(", ",$stat_arr[$ic])." ; ";
      $self->$fld($stat_arr[$ic]);
      $ic++;
    }
    #print "\n";
    return $self;
  }
  1;
}

# try to "replace" the File::Find::Object::Result
{
  package File::Find::Object::Result;
  use File::Find::Object::Result;
  #use File::Find::Object::StatObj; # no, has no file!

  use Class::XSAccessor replace => 1,
      accessors => {
          (map { $_ => $_ } (qw(
          base
          basename
          is_dir
          is_file
          is_link
          path
          dir_components
          stat_ret
          stat_obj
          )))
      }
      ;

  #use Fcntl qw(:mode);
  #sub new # never gets called
  sub full_components
  {
    my $self = shift; #print("NEWCOMP\n");
    my $sobj = File::Find::Object::StatObj->new(@{$self->stat_ret()});
    $self->stat_obj($sobj); # add stat_obj and its fields
    return
    [
      @{$self->dir_components()},
      ($self->is_dir() ? () : $self->basename()),
    ];
  }
  1;
}

# main script start

my $io = io($targetDirsToScan[0]);
my @contents = $io->all(0);                    # Get all contents of dir
for my $contentry ( @contents ) {
  print Dumper \%{*$contentry};
  print $contentry->name . "\n"; # img/test.png
  # get a File::Find::Object::Result - must instantiate
  #  a File::Find::Object; just item_obj() will return undef
  #  right after instantiation, so must give it "next";
  # no instantition occurs for $tro, though!
  #my $tffor = File::Find::Object->new({}, ($contentry->name))->next_obj();
  my $tffo = File::Find::Object->new({}, ("./".$contentry->name));
  my $tffos = $tffo->next(); # just a string!
  $tffo->_calc_current_item_obj(); # unfortunately, this will not calculate dir_components ...
  my $tffor = $tffo->item_obj();
  # ->full_components doesn't call new, either!
  # must call full_compoments, to generate the fields
  #  (assign to unused variable triggers it fine)
  # however, $arrref_fullcomp will be empty, because
  #  File::Find::Object seemingly calcs dir_components only
  #  if it is traversing a tree...
  $arrref_fullcomp = $tffor->full_components;
  #print("# ".$tffor->stat_obj->size."\n"); # seems to work
  print "> ". join(", ", @$arrref_fullcomp) ." - ". Dumper($tffor);
}
查看更多
Juvenile、少年°
3楼-- · 2019-07-26 07:06

Perl does introspection in the fact that an object will tell you what type of object it is.

if ( $object->isa("Foo::Bar") ) {
    say "Object is of a class of Foo::Bar, or is a subclass of Foo::Bar.";
}

if ( ref $object eq "Foo::Bar" ) {
    say "Object is of the class Foo::Bar.";
}
else {
    say "Object isn't a Foo::Bar object, but may be a subclass of Foo::Bar";
}

You can also see if an object can do something:

if ( $object->can("quack") ) {
    say "Object looks like a duck!";
}

What Perl can't do directly is give you a list of all the methods that a particular object can do.

You might be able to munge some way.Perl objects are stored in package namespaces which are in the symbol table. Classes are implemented via Perl subroutines. It may be possible to go through the package namespace and then find all the subroutines.

However, I can see several issues. First private methods (the ones you're not suppose to use) and non-method subroutines would also be included. There's no way to know which is which. Also, parent methods won't be listed.

Many languages can generate such a list of methods for their objects (I believe both Python and Ruby can), but these usually give you a list without an explanation what these do. For example, File::Find::Object::Result (which is returned by the next_obj method of File::Find::Object) has a base method. What does it do? Maybe it's like basename and gives me the name of the file. Nope, it's like dirname and gives me the name of the directory.

Again, some languages could give a list of those methods for an object and a description. However, those descriptions depend upon the programmer to maintain and make sure they're correct. No guaranteed of that.

Perl doesn't have introspection, but all Perl modules stored in CPAN must be documented via POD embedded documentation, and this is printable from the command line:

$ perldoc File::Find::Object

This is the documentation you see in CPAN pages, in http://Perldoc.perl.org and in ActiveState's Perl documentation.

It's not bad. It's not true introspection, but the documentation is usually pretty good. After all, if the documentation stunk, I probably wouldn't have installed that module in the first place. I use perldoc all the time. I can barely remember my kids' names let alone the way to use Perl classes that I haven't used in a few months, but I find that using perldoc works pretty wall.

What you should not do is use Data::Dumper to dump out objects and try to figure out what they contain and possible methods. Some cleaver programmers are using Inside-Out Objects to thwart peeking toms.

So no, Perl doesn't list methods of a particular class like some languages can, but perldoc comes pretty close to doing what you need. I haven't use File::Find::Object in a long while, but going over the perldoc, I probably could write up such a program without much difficulty.

查看更多
smile是对你的礼貌
4楼-- · 2019-07-26 07:08

As I answered to your previous question, it is not a good idea to go relying on the guts of objects in Perl. Instead just call methods.

If IO::All doesn't offer a method that gives you the information that you need, you might be able to write your own method for it that assembles that information using just the documented methods provided by IO::All...

use IO::All;

# Define a new method for IO::All::Base to use, but
# define it in a lexical variable!
#
my $dump_info = sub {
   use Data::Dumper ();
   my $self = shift;
   local $Data::Dumper::Terse    = 1;
   local $Data::Dumper::Sortkeys = 1;
   return Data::Dumper::Dumper {
      name    => $self->name,
      mtime   => $self->mtime,
      mode    => $self->mode,
      ctime   => $self->ctime,
   };
};

$io = io('/tmp');
for my $file ( $io->all(0) ) {
   print $file->$dump_info();
}
查看更多
登录 后发表回答