How to fetch content from a webpage?

2020-07-23 03:47发布

I want to fetch div content from a webpage and to use it in my page.

I have the url http://www.freebase.com/search?limit=30&start=0&query=cancer
I want to fetch div content with id artilce-1001. How can I do that in php or jQuery?

标签: php jquery
6条回答
叛逆
2楼-- · 2020-07-23 04:11

In PHP you'll probably want to GET the page (probably using CURL or similar) then you'll have to parse the html, which is probably not the easiest thing to do, but I'm guessing there are libraries out there to help you with that.

查看更多
【Aperson】
3楼-- · 2020-07-23 04:16

If you want to use PHP, you may want to have a look at Simple HTML DOM. It is a nice single include file. The docs give an example of scraping slashdot as:

$html = file_get_html('http://slashdot.org/');

// Find all article blocks
foreach($html->find('div.article') as $article) {
    $item['title']     = $article->find('div.title', 0)->plaintext;
    $item['intro']    = $article->find('div.intro', 0)->plaintext;
    $item['details'] = $article->find('div.details', 0)->plaintext;
    $articles[] = $item;
}

Regex is never any good at (and should never be used for) parsing HTML. It isn't regular, and you end up with huge regular expressions for what would be simple in jQuery or the above library

EDIT:
So you would want to use something like

$html = file_get_html('http://www.freebase.com/search?limit=30&start=0&query=cancer');
$text = $html->find('div[id=artilce-1001]',0)->plaintext;
查看更多
Lonely孤独者°
4楼-- · 2020-07-23 04:30

PHP is server-side, jQuery is client side so it really depends on what you want to achieve. Also note that because of the same-origin policy, you generally can't perform an Ajax request to another domain via javascript anyway (but you could proxy it via your own server)

jQuery aside, here's a simple way to do it in PHP, which will work for the case you provide

$url="http://www.freebase.com/search?limit=30&start=0&query=cancer";
$html=file_get_contents($url);

if (preg_match('{<div id="article-1001".*?>(.*?)</div>}s', $html, $matches))
{
    $content=$matches[1];
}

Note the 's' modifier, which makes . match newlines, and the .*? idiom, which makes the matching the inner part non-greedy so that it only eats up the next </div>

This works for your case, but regexes are generally ill suited to this task. You could load the HTML into a DOmDocument and access it that way.

$doc = new DOMDocument();
$doc->loadHTML($html);
$div=$doc->getElementById("article-1001");
查看更多
唯我独甜
5楼-- · 2020-07-23 04:31

PHP:

$content = file_get_contents('http://www.freebase.com/search?limit=30&start=0&query=cancer');

$match = preg_match("#id=\"article-1001\".*</div>#", $content, $matches);

Regular expression probably won't work, but it's example or direction you can use, just play with it :)

查看更多
我欲成王,谁敢阻挡
6楼-- · 2020-07-23 04:32

If this really is about a Freebase topic and not about getting HTML from a website in general, using the API and getting familiar with MQL should be the better solution since that would allow you to restrict your search in specific types easily.

Example:

[{
  "/common/topic/article": {
    "guid":     null,
    "limit":    1,
    "optional": true
  },
  "/common/topic/image": {
    "id":       null,
    "limit":    1,
    "optional": true
  },
  "id":     null,
  "name":   null,
  "name~=": "*Cancer*",
  "type":   "/user/radiusrs/default_domain/astrological_sign"
}]​

Can be passed to mqlread directly and returns a JSON list with possible matches for the astrological sign "Cancer". Then, you can simply get the article and image by using trans_raw and/or trans_blurb, if you need to. :)

查看更多
劳资没心,怎么记你
7楼-- · 2020-07-23 04:38

Use the following

$("#LoadIntoThisDiv").load("http://www.freebase.com/search?limit=30&start=0&query=cancer #artilce-1001");

There is an example like this on the jQuery site here

查看更多
登录 后发表回答