Pull HTML content from remote website and display

2019-02-11 07:49发布

问题:

Been working on this for a little while now and am stumped. I am attempting to pull the content from within a specific div on a remote website page and then insert that html into a div on my own website. I know that you cannot solely use jQuery's .ajax, .load, or .get methods for this type of operation.

Here's the remote page's HTML:

<html>
    <body>
        <div class="entry-content">
            <table class="table">
                ...table #1 content...
                ...More table content...
            </table>
            <table class="table">
                ...table #2 content...
            </table>
            <table class="table">
                ...table #3 content...
            </table>
        </div>
    </body>
</html>

Goal: I am attempting to fetch the html from the remote page's first table. So, on my website, I would like the following html to be fetched and placed in a div of id="fetched-html":

<table class="table">
    ...table #1 content...
    ...More table content...
</table>

Here's where I'm at with my PHP function thus far:

<?php
function pullRaspi_SDImageTable() {
    $url = "http://www.raspberrypi.org/downloads";
    $curl = curl_init($url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($curl);
    curl_close($curl);

    // Create new PHP DOM document
    $DOM = new DOMDocument;
    // Load html from curl request into document model
    $DOM->loadHTML($output);

    // Get 1st table
    $output = $DOM->firstChild->getElementsByTagName('table');

    return $output;
}
?>

The final result should look like this on my local website page:

<div id="fetched-html">
    <table class="table">
        ...table #1 content...
        ...More table content...
    </table>
</div>

Here's another PHP function possibility?

<?php
function pullRaspPi_SDImageTable() {
    // Url to fetch
    $url = "http://www.raspberrypi.org/downloads";

    $ch = curl_init($url);
    $fp = fopen("raspberrypi_sdimagetable.txt", "w");
    curl_setopt($ch, CURLOPT_FILE, $fp);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);

    // Write html source to variable
    $rasp_sdimagetable = curl_exec($ch);

    // Close curl request
    curl_close($ch);

    return $rasp_sdimagetable;
}

// Then in the head of the html, add this jQuery:
<script type="text/javascript">
    $("#fetched-html").load("<?php pullRaspPi_SDImageTable(); ?> table.table:first");
</script>

Problem is, neither function works. :( Any thoughts?

回答1:

Extracting a fragment of HTML from a website is a breeze with simplehtmldom you can then do something like:

function pullRaspi_SDImageTable() {
    $filename = '/tmp/downloads.html';  /// Where you want to cache the result
    $expiry = 600;  // 10 minutes
    $output = '';

    if (!file_exists($filename) ||  time() - $expiry > filemtime($filename)) {
        // There is no cache, so fetch the results from remote server
        require_once('simple_html_dom.php');
        $html = file_get_html('http://www.raspberrypi.org/downloads');
        foreach($html->find('div.entry-content table.table') as $elem) {
                $output .= (string)$elem;
        }

        // Store the cache
        file_put_contents($filename, $output);
    } else {
        // Pull the content from the cahce
        $output = file_get_contents($filename);
    }

    return $output;
}

Which will give you the table.table HTML



回答2:

you cannot solely use jQuery's .ajax, .load, or .get methods for this type of operation

Yes, you can BUT the remote website must give you authorization for that.. just inserting an iframe and using the normal DOM functions you could IF THERE IS NOT cross domain restrictions.

You can get a FULL page only with php (using the common functions include, require, etc and passing the website URL but, same case, you need to be authorized..