I am trying to parse html page of Google play
and getting some information about apps. Simple-html-dom works perfect, but if page contains code without spaces, it completely ingnores attributes. For instance, I have html code:
<div class="doc-banner-icon"><img itemprop="image"src="https://lh5.ggpht.com/iRd4LyD13y5hdAkpGRSb0PWwFrfU8qfswGNY2wWYw9z9hcyYfhU9uVbmhJ1uqU7vbfw=w124"/></div>
As you can see, there is no any spaces between image
and src
, so simple-html-dom ignores src
attribute and returns only <img itemprop="image">
. If I add space, it works perfectly. To get this attribute I use the following code:
foreach($html->find('div.doc-banner-icon') as $e){
foreach($e->find('img') as $i){
$bannerIcon = $i->src;
}
}
My question is how to change this beautiful library to get full inner text of this div
?
I just create function which adds neccessary spaces to content:
And then use it after
file_get_contents
function. So:Hope it helps to someone else.