I am using readability API to do this. In their example they have show lead_img_url
but I could not fetch it.
REference: https://www.readability.com/developers/api/parser
Is this correct way to make direct request:
it says: {"messages": "The API Key in the form of the 'token' parameter is invalid.", "error": true}
Another try:
<?php
define('TOKEN', "1b830931777ac7c2ac954e9f0d67df437175e66e");
define('API_URL', "https://www.readability.com/api/content/v1/parser?url=%s&token=%s");
function get_image($url) {
// sanitize it so we don't break our api url
$encodedUrl = urlencode($url);
$TOKEN = '1b830931777ac7c2ac954e9f0d67df437175e66e';
$API_URL = 'https://www.readability.com/api/content/v1/parser?url=%s&token=%s';
// $API_URL = 'http://blog.readability.com/2011/02/step-up-be-heard-readability-ideas';
// build our url
$url = sprintf($API_URL, $encodedUrl, $TOKEN);
// call the api
$response = file_get_contents($url);
if( $response ) {
return false;
}
$json = json_decode($response);
if(!isset($json['lead_image_url'])) {
return false;
}
return $json['lead_image_url'];
}
Error: Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=http%3A%2F%2Fthenwat.com%2Fthenwat%2Finvite%2Findex.php&token=1b830931777ac7c2ac954e9f0d67df437175e66e): failed to open stream: HTTP request failed! HTTP/1.1 403 FORBIDDEN in F:\wamp\www\inviteold\test2.php on line 32
one more:
require 'readability/lib/Readability.inc.php';
$url = 'http://www.nextbigwhat.com';
$html = file_get_contents($url);
$Readability = new Readability($html); // default charset is utf-8
$ReadabilityData = $Readability->getContent();
$image= $ReadabilityData['lead_image_url'];
$title= $ReadabilityData['title']; //This works fine.
$content = $ReadabilityData['word_count'];
echo "$content";
It says: Notice: Undefined index: lead_image_url in F:\wamp\www\inviteold\test2.php on line 13
First, in order to use the REST API that they provide, you need to create an account. Afterwards you can generate your own
token
to use in the call. Thetoken
provided by the examples will not work because it is purposefully invalid. Its purpose is for example only.Second, make sure the
allow_url_fopen
directive in yourphp.ini
file is set totrue
. For the purposes of a test script, or if you cannot change yourphp.ini
file (shared hosting solutions), you can useini_set('allow_url_fopen', true);
at the top of your page.Lastly, in order to parse the images yourself you'll need to retrieve all image elements from the DOM you retrieve. Sometimes there won't be any images, and sometimes there will be. It depends on what page you're pulling from. Additionally, you'll need to resolve relative paths...
Your Code
After executing
Readability
, you can utilize theDOMDocument
class to retrieve your images from the contents you pulled. Instantiate a newDOMDocument
and load in your HTML. Make sure to use thelibxml_use_internal_errors
function to supress errors caused by the parser on most websites. We'll put this in a function to make it easier to use elsewhere if needbe.You can now retrieve all image elements from the document you instantiated, and then get their
src
attribute... like so:Now you have an array of images that you can present to the user for use. But before you do that, we forgot one more thing... We want to resolve all relative paths so that we always have an absolute path to the image that lives on another site.
To do this, we have to determine the base domain URL, and the relative path to the current page we're working with. We can do so using the
parse_url()
function provided by PHP. For simplicity's sake, we can throw this into a function.Add an additional parameter to the original
sampleDomMedia
function, and we can call this function to get our paths. Then we can check thesrc
attribute's value to determine what kind of path it is, and resolve it.And last, but certainly not least, we're left with the two previous functions, and this piece of procedural code:
Also, if you think the contents of the article may have an image inside of it (usually doesn't), you can use the
contents
returned fromReadability
rather than the$html
variable, like so:I hope that helps.