I have a condition where i want to retrieve text from a specific tag, but it does not seem to be returning true.. any help?
#!/usr/bin/perl
use HTML::TreeBuilder;
use warnings;
use strict;
my $URL = "http://prospectus.ulster.ac.uk/modules/index/index/selCampus/JN/selProgramme/2132/hModuleCode/COM137";
my $tree = HTML::TreeBuilder->new_from_content($URL);
if (my $div = $tree->look_down(_tag => "div ", class => "col col60 moduledetail")) {
printf $div->as_text();
print "test";
open (FILE, '>mytest.txt');
print FILE $div;
close (FILE);
}
print $tree->look_down(_tag => "th", class => "moduleCode")->as_text();
$tree->delete();
It is not getting into the if statement and the print outside the if statement is saying that there is an undefined value, but i know that it should be returning true because these tags do exist.
<th class="moduleCode">COM137<small>CRN: 33413</small></th>
thanks
You are calling
HTML::TreeBuilder->new_from_content
yet you are supplying a URL instead of content. You have toget
the HTML before you can pass it toHTML::TreeBuilder
.Perhaps the simplest way is to use LWP::Simple which imports a subroutine called
get
. This will read the data at the URL and return it as a string.The reason your conditional block is never executed is that you have a space in the tag name. You need
"div"
instead of"div "
.Also note the following:
You shouldn't output a single string by using
printf
with that string as a format specifier. It may generate missing argument warnings and fail to output the string properly.You should ideally use lexical file handles and the three-argument form of
open
. You should also check the status of allopen
calls and respond accordingly.Your scalar variable
$div
is a blessed hash reference, so printing it as it is will output something likeHTML::Element=HASH(0xfffffff)
. You need to call its methods to extract the values you want to displayWith these errors corrected your code looks like this, although I haven't formatted the output as I can't tell what you want.