Screen Scraping PHP using SimpleHTMLDom

2019-03-06 10:48发布

Trying to screen scrap content in php and assign to an array. I need the following data using the library 'SimpltHTMLDom' refference: Parse html table using file_get_contents to php array

Desired results:

  • Hospital Name background color in <text> (css) (if it exist!!!)
    • need all five <text> (as null if no background color)

Array:

Hospital 1    
--> NULL    
--> #ff0000    
--> 08:50    
--> NULL    
--> NULL

Hospital 2    
--> #ffff00    
--> 08:50    
--> NULL    
--> NULL    
--> NULL

PHP:

 <?php
 require('simple_html_dom.php');
 $table = array();


$html = file_get_html('https://www.miemssalert.com/chats/Default.aspx?hdRegion=3');
foreach($html->find('table#tblHospitals tr') as $row) {
   $hospital = $row->find('td.Chats',0)->plaintext;
   $color = $row->getAttribute('td.Chats style',2);
   $time = $row->find('td.Chats',2)->plaintext;
   //$text = $row->getAttribute('alt');

$table[$hospital][$color][$time][$text] = true;

}

 echo '<pre>';
 print_r($table);
 echo '</pre>';
?>

the HTML of the DOM (this is small sample of the page):

    <div id="Page1" style="display: none; width: 100%;">
                                <div id="HospitalUpdatePanel">

                                        <table id="tblHospitals" cellspacing="0" cellpadding="1" align="Left" rules="all" border="1" style="border-color:Black;border-width:1px;border-style:Solid;width:100%;border-collapse:collapse;table-layout: fixed;">
        <tr>
            <th title="Hospital" class="Chats" style="background-color:Silver;font-weight:bold;width:25%;">Hospital</th><th title="The emergency department temporarily requests that it receive absolutely no patients in need of urgent medical care. Yellow alert is initiated because the Emergency dept is experiencing a temporary overwhelming overload such that priority II and III patients may not be managed safely. Prior to diverting pediatric patients, medical consultation is advised for pediatric patient transports when emergency departments are on yellow alert." class="Chats" style="font-weight:bold;width:9%;background-color:#ffff00;color:#000000;">Yellow Alert</th><th title="The hospital has no ECG monitored beds available. These ECG monitored beds will include all in-patient critical care areas and telemetry beds." class="Chats" style="font-weight:bold;width:9%;background-color:#ff0000;color:#000000;">Red Alert</th><th title="The emergency department reports that their facility has, in effect, suspended operation and can receive absolutely no patients due to a situation such as a power-outage, fire, gas leak, bomb scare, etc." class="Chats" style="font-weight:bold;width:9%;background-color:#006600;color:#ffffff;">Mini Disaster</th><th title="An ALS/BLS unit is being held in the emergency department of a hospital due to lack of an available bed. (This does not replace Yellow Alert.)" class="Chats" style="font-weight:bold;width:9%;background-color:#ff6600;color:#000000;">ReRoute</th><th title="The hospital's ability to function as a trauma center has been exceeded. (This decision is at the discretion of the facility.)" class="Chats" style="font-weight:bold;width:9%;background-color:#9933cc;color:#ffffff;">Trauma ByPass</th><th title="The hospital's capacity has been exceeded." class="Chats" style="font-weight:bold;width:9%;background-color:#000000;color:#ffffff;">Capacity</th>
        </tr><tr>
            <td class="Chats">Anne Arundel Medical Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Baltimore Washington Medical Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Bon Secours Hospital</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Carroll Hospital Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Franklin Square (MedStar)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Good Samaritan Hospital (MedStar)</td><td class="Chats"></td><td class="Chats" style="background-color:#ff0000;color:#000000;">08:50</td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Greater Baltimore Medical Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Harbor Hospital (MedStar)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Harford Memorial Hospital (UMUCH)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Howard County General Hospital (JHM)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Johns Hopkins Bayview Medical Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Johns Hopkins Hospital</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Johns Hopkins Hospital (Pediatric ED)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Mercy Medical Center</td><td class="Chats" style="background-color:#ffff00;color:#000000;">08:50</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Midtown (UM)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Northwest Hospital</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">R Adams Cowley Shock Trauma Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td>
        </tr><tr>
            <td class="Chats">Sinai Hospital of Baltimore</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">St. Agnes Hospital</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">St. Joseph’s  (UM)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Union Memorial Hospital  (MedStar)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">University of Maryland Medical Center</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr><tr>
            <td class="Chats">Upper Chesapeake Medical Center (UMUCH)</td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats"></td><td class="Chats-null"></td><td class="Chats-null"></td>
        </tr>
    </table>
                                        <span id="lblHospitalsErrorMessage" style="color:Red;font-weight:bold;visibility: hidden;"></span>

</div>
                            </div>

REVISED PHP ABOVE: here is output, still not desired results???

[Good Samaritan Hospital (MedStar)] => Array
    (
        [0] => Array
            (
                [11:58] => Array
                    (
                        [0] => 1
                    )

            )

    )

2条回答
劳资没心,怎么记你
2楼-- · 2019-03-06 11:54

turned out to only be two lines of code:

<?php
require('simple_html_dom.php');

$html = file_get_html('https://www.miemssalert.com/chats/Default.aspx?hdRegion=3');
foreach($html->find('table#tblHospitals tr td.Chats') as $e)
    echo $e->plaintext . $e->getAttribute('style') . '<hr>';
?>

results array looks like:

array(37) {
  ["Anne Arundel Medical Center"]=>
  array(1) {
    [0]=>
    bool(true)
  }
  [""]=>
  array(1) {
    [0]=>
    bool(true)
  }
  ["Baltimore Washington Medical Center"]=>
  array(1) {
    [0]=>
    bool(true)
  }
  ["04:31"]=>
  array(1) {
    ["background-color:#ffff00;color:#000000;"]=>
    bool(true)
  }
  ["Bon Secours Hospital"]=>
  array(1) {
    [0]=>
    bool(true)
  }
查看更多
戒情不戒烟
3楼-- · 2019-03-06 11:55

There are several problems with the posted code:

  1. The find() method takes a CSS selector, not HTML markup. If you want to find <table id="tblHospitals">, use table#tblHospitals and so on.
  2. foreach($html->find(table#tblHospitals') as $row) would iterate a single table element, not the rows. You might want to use a selector that selects actual row elements, e.g.: table#tblHospitals tr
查看更多
登录 后发表回答