Removing unwanted elements from table simple_html_

2019-09-10 13:04发布

问题:

I am fetching a page that is a page with some style tags, table and other non vital content. I'm storing this in a transient, and fetching it all with AJAX

$result_match = file_get_contents( 'www.example.com' );

set_transient( 'match_results_details', $result_match, 60 * 60 * 12 );

$match_results = get_transient( 'match_results_details' );

if ( $match_results != '') {

    $html = new simple_html_dom();
    $html->load($match_results);

    $out = '';

    $out .= '<div class="match_info_container">';
    if (!empty($html) && is_object($html)) {
        foreach ($html->find('table') as $table => $table_value) {
            $out .= preg_replace('/href="?([^">]+)"/', '', $table_value);
        }
    }
    $out .= '</div>';

    wp_die ( $out );

} else {
    $no_match_info = esc_html__('No info available', 'kompisligan');
    wp_die($no_match_info);
}

Now the table had anchors and I needed to remove that, so I used preg_replace to find any anchor and empty it out. I know that you can manipulate the contents with find() method, but I had no success with that.

Now I would like to get rid of the entire <tfoot> tag, and what it contains.

But every time I try to 'find' something, the ajax returns error, meaning that something in my code is wrong.

How do I manipulate contents of already found element with simple_html_dom? I tried outputting the contents of $html so that I can see what I'll get out but my AJAX call lasts forever and I cannot get it out.

回答1:

You could try this, using builtin DOMDocument instead of simple_html_dom. However, if your Ajax call is timing out, it might be a different problem (not being able to load example.com or so).

if ( $match_results != '') {

    $html = new DOMDocument();
    // Suppress errors
    @$html->loadHTML($match_results);

    $out = '<div class="match_info_container">';

    // Remove all "href" tags from <a>
    foreach($html->getElementsByTagName('a') as $href)
        $href->setAttribute('href', '');

    // Remove Tfoot
    foreach($html->getElementsByTagName('tfoot') as $tfoot) 
        $tfoot->parentNode->removeChild($tfoot);

    // Put the contents of every <table> in the div.
    foreach($html->getElementsByTagName('table') as $table)
        $out .= $table->nodeValue;


    $out .= '</div>';




    wp_die ( $out );

} else {