How to fetch content from a webpage?

I want to fetch div content from a webpage and to use it in my page.

I have the url http://www.freebase.com/search?limit=30&start=0&query=cancer
I want to fetch div content with id artilce-1001. How can I do that in php or jQuery?

标签： php jquery

6条回答

叛逆

2楼-- · 2020-07-23 04:11

In PHP you'll probably want to GET the page (probably using CURL or similar) then you'll have to parse the html, which is probably not the easiest thing to do, but I'm guessing there are libraries out there to help you with that.

0人赞添加讨论(0) 举报

【Aperson】

3楼-- · 2020-07-23 04:16

If you want to use PHP, you may want to have a look at Simple HTML DOM. It is a nice single include file. The docs give an example of scraping slashdot as:

$html = file_get_html('http://slashdot.org/');

// Find all article blocks
foreach($html->find('div.article') as $article) {
    $item['title']     = $article->find('div.title', 0)->plaintext;
    $item['intro']    = $article->find('div.intro', 0)->plaintext;
    $item['details'] = $article->find('div.details', 0)->plaintext;
    $articles[] = $item;
}

Regex is never any good at (and should never be used for) parsing HTML. It isn't regular, and you end up with huge regular expressions for what would be simple in jQuery or the above library

EDIT:
So you would want to use something like

$html = file_get_html('http://www.freebase.com/search?limit=30&start=0&query=cancer');
$text = $html->find('div[id=artilce-1001]',0)->plaintext;

0人赞添加讨论(0) 举报

Lonely孤独者°

4楼-- · 2020-07-23 04:30

PHP is server-side, jQuery is client side so it really depends on what you want to achieve. Also note that because of the same-origin policy, you generally can't perform an Ajax request to another domain via javascript anyway (but you could proxy it via your own server)

jQuery aside, here's a simple way to do it in PHP, which will work for the case you provide

$url="http://www.freebase.com/search?limit=30&start=0&query=cancer";
$html=file_get_contents($url);

if (preg_match('{<div id="article-1001".*?>(.*?)</div>}s', $html, $matches))
{
    $content=$matches[1];
}

Note the 's' modifier, which makes . match newlines, and the .*? idiom, which makes the matching the inner part non-greedy so that it only eats up the next </div>

This works for your case, but regexes are generally ill suited to this task. You could load the HTML into a DOmDocument and access it that way.

$doc = new DOMDocument();
$doc->loadHTML($html);
$div=$doc->getElementById("article-1001");

0人赞添加讨论(0) 举报

唯我独甜

5楼-- · 2020-07-23 04:31

PHP:

$content = file_get_contents('http://www.freebase.com/search?limit=30&start=0&query=cancer');

$match = preg_match("#id=\"article-1001\".*</div>#", $content, $matches);

Regular expression probably won't work, but it's example or direction you can use, just play with it :)

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

6楼-- · 2020-07-23 04:32

If this really is about a Freebase topic and not about getting HTML from a website in general, using the API and getting familiar with MQL should be the better solution since that would allow you to restrict your search in specific types easily.

Example:

[{
  "/common/topic/article": {
    "guid":     null,
    "limit":    1,
    "optional": true
  },
  "/common/topic/image": {
    "id":       null,
    "limit":    1,
    "optional": true
  },
  "id":     null,
  "name":   null,
  "name~=": "*Cancer*",
  "type":   "/user/radiusrs/default_domain/astrological_sign"
}]

Can be passed to mqlread directly and returns a JSON list with possible matches for the astrological sign "Cancer". Then, you can simply get the article and image by using trans_raw and/or trans_blurb, if you need to. :)

0人赞添加讨论(0) 举报

劳资没心，怎么记你

7楼-- · 2020-07-23 04:38

Use the following

$("#LoadIntoThisDiv").load("http://www.freebase.com/search?limit=30&start=0&query=cancer #artilce-1001");

There is an example like this on the jQuery site here

0人赞添加讨论(0) 举报

How to fetch content from a webpage?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间