How can I extract iframes from text with Perl'

2019-07-17 01:47发布

问题:

I have this text and when I do:

print STDERR (Mojo::DOM->new($args->{'body'})->at('iframe')); 

output:

<iframe allowfullscreen="" frameborder="0" height="360" scrolling="no" 
src="http://localhost:8000/embed/static/clips/2012/12/17/28210/test-rush" width="480">
</iframe>`

It is just printing the first iframe in the body. Why is it not printing the other iframes and can I put all iframes in an array?

回答1:

According to Mojo::Dom documentation. The at function only finds the first element matching. So it should only return 1. I think find is what you are after, as it returns a collection that matches

use strict;
use warnings;

use Mojo::DOM;

my $dom = Mojo::DOM->new();
while (<DATA>) {
    $dom->append_content($_);
}

#print $dom;

print $dom->find('iframe');

__DATA__
<p>No one's telling the truth anymore, and that makes the numbers suspect.</p>
<p><iframe width="480" height="360" src="http://localhost:8000/embed/static/clips/2012/12/17/28210/test-rush" allowfullscreen="" frameborder="0" scrolling="no"></iframe></p>
<p>Instead of addressing the fact that some text</p>
<p><iframe width="480" height="360" src="http://localhost:8000/embed//static/video/2012/09/07/fnc-ff-20120907-doocytaxes" allowfullscreen="" frameborder="0" scrolling="\&quot;no\&quot;"></iframe></p>
<p>The very first example AP cites was already corrected.some text ....Reacting to recent <a href="/blog/2013/04/17/major-errors-undermine-key-argument-for-austeri">research</a> that has questions.</p>
<p><iframe width="480" height="360" src="http://localhost:8000/embed/static/clips/2013/04/29/29939/fnc-an-20130429-hemmermooredebtgdp" allowfullscreen="" frameborder="0" scrolling="no"></iframe></p>
<p>&nbsp;Arriving at such a conclusion requires not only obscuring the importance in pushing global austerity <a href="/static/images/item/gdp-components.jpg">strong measures</a> of too little spending.</p>

prints your iframes:

<iframe allowfullscreen="" frameborder="0" height="360" scrolling="no" src="http://localhost:8000/embed/static/clips/2012/12/17/28210/test-rush" width="480"></iframe> <iframe allowfullscreen="" frameborder="0" height="360" scrolling="\&quot;no\&quot;" src="http://localhost:8000/embed//static/video/2012/09/07/fnc-ff-20120907-doocytaxes" width="480"></iframe> <iframe allowfullscreen="" frameborder="0" height="360" scrolling="no" src="http://localhost:8000/embed/static/clips/2013/04/29/29939/fnc-an-20130429-hemmermooredebtgdp" width="480"></iframe>

EDIT:

  1. You can iterate over each iframe with each function of Mojo::Collection:

    my $collection = $dom->find('iframe');

    $collection->each(sub {
        my ($e, $count) = @_;
        print "$count: $e\n"; # Or do something besides print. 
     });
    
  2. You can add an @ to loop over it like an array:

    foreach (@$collection) {
       print "\n Next Elt.:", $_->{src}, ",\n"; #still access elements of iframe with ->
    }
    


标签: perl mojo