I am parsing large XML files (60GB+) with XML::Twig and using it in a OO (Moose) script. I am using the twig_handlers
option to parse elements as soon as they're read into memory. However, I'm not sure how I can deal with the Element and Twig.
Before I used Moose (and OO altogether), my script looked as follows (and worked):
my $twig = XML::Twig->new(
twig_handlers => {
$outer_tag => \&_process_tree,
}
);
$twig->parsefile($input_file);
sub _process_tree {
my ($fulltwig, $twig) = @_;
$twig->cut;
$fulltwig->purge;
# Do stuff with twig
}
And now I'd do it like this.
my $twig = XML::Twig->new(
twig_handlers => {
$self->outer_tag => sub {
$self->_process_tree($_);
}
}
);
$twig->parsefile($self->input_file);
sub _process_tree {
my ($self, $twig) = @_;
$twig->cut;
# Do stuff with twig
# But now the 'full twig' is not purged
}
The thing is that I now see that I am missing the purging of the fulltwig
. I figured that - in the first, non-OO version - purging would help on saving memory: getting rid of the fulltwig as soon as I can. However, when using OO (and having to rely on an explicit sub{}
inside the handler) I don't see how I can purge the full twig because the documentation says that
$_ is also set to the element, so it is easy to write inline handlers like
para => sub { $_->set_tag( 'p'); }
So they talk about the Element you want to process, but not the fulltwig itself. So how can I delete that if it is not passed to the subroutine?
The handler still gets the full twig, you're just not using it (using $_ instead).
As it turns out you can still call
purge
on the twig (which I usually call "element", orelt
in the docs):$_->purge
will work as expected, purging the full twig up to the current element in $_;A cleaner (IMHO) way would be to actually get all of the parameters and purge the full twig expicitely: