I am trying to use Web::Scraper to parse the following HTML:
<div>
<p><strong>TITLE1</strong>
<br>
DESCRIPTION1
</p>
<p><strong>TITLE2</strong>
<br>
DESCRIPTION2
</p>
<p><strong>TITLE3</strong>
<br>
DESCRIPTION3
</p>
</div>
into
'test' => [
{
'name' => 'TITLE1',
'desc' => 'DESCRIPTION1 '
},
{
'name' => 'TITLE2',
'desc' => 'DESCRIPTION2 '
},
{
'name' => 'TITLE3',
'desc' => 'DESCRIPTION3 '
}
]
I have the following code but I don't have much luck. 'TEXT' when processing 'p' gives both the text and what is between "strong" for example
'test' => [
{
'name' => 'TITLE1',
'desc' => 'TITLE1 DESCRIPTION1 '
}
]
plus its only the first item.
Here is my code.
use strict;
use Web::Scraper;
use Data::Dumper;
my $html = q[<div>
<p><strong>TITLE1</strong>
<br>
DESCRIPTION1
</p>
<p><strong>TITLE2</strong>
<br>
DESCRIPTION2
</p>
<p><strong>TITLE3</strong>
<br>
DESCRIPTION3
</p>
</div>
];
my $test = scraper {
process 'div', 'test[]' => scraper {
process 'p strong', 'name' => 'TEXT';
process 'p','desc' => 'TEXT';
};
};
my $res = $test->scrape(\$html);
print Dumper($res);
Thank you.
There are two points in your code that need changing.
To get only the DESCRIPTION-text, use xpath.
//p/text()
will give you the text-nodes directly under anyp
, so the ones inside of thestrong
are not included.To make all blocks of
p
show up in the array, and not only the first one, make the first instruction be ondiv p
. That way it grabs allp
inside of adiv
and not only the onediv
.Output (with Data::Printer):