As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened,
visit the help center for guidance.
Closed 6 years ago.
I'm looking to write a library that "parses" information like facebook does when you post a link. However as I don't want to reinvent the wheel does anyone know of a library or and effort to write an Library that does this already?
I have included an example so that you can get a grasp of what I mean if you don't use face book. http://lh4.ggpht.com/_zbED-KN_ZAI/Sx6LuDmZkVI/AAAAAAAADLs/mN7eFnzL1gE/s144/example.png
Haven't seen any library for that, but looks a pretty simple thing. I've jot down a quick function which can help you out. I have kept it simple, you might want to use cURL to fetch the content, put some error handling, etc.
Anyway, here is my two cents:
<?php
function getLinkInfo($url)
{
// Get target link html
$html = file_get_contents($url);
// Prepare the DOM document
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
// Get page title
$titles = $dom->getElementsByTagname('title');
foreach ($titles as $title) {
$linkTitle = $title->nodeValue;
}
// Get META tags
$metas = $dom->getElementsByTagname('meta');
// We only need description
foreach ($metas as $meta) {
if ($meta->getAttribute("name") == "description") {
$linkDesc = $meta->getAttribute("content");
}
}
// Get all images
$imgs = $dom->getElementsByTagname('img');
// Again, we need the first one only
foreach ($imgs as $img) {
$firstImage = $img->getAttribute("src");
if (strpos("http://", $firstImage) === false) {
$firstImage = $url . $firstImage;
}
break;
}
$output = <<<HTML
<div class="info">
<div class="image"><img src="{$firstImage}" alt="{$linkTitle}" /></div>
<div class="desc">
<div class="title">{$linkTitle}</div>
<div class="subtitle">{$url}</div>
<div class="summary">{$linkDesc}</div>
</div>
</div>
HTML;
return $output;
}
echo getLinkInfo("http://www.phpfour.com/");
John Gruber has a regex pattern that might help:
A common programming problem:
identify the URLs in an arbitrary
string of text, where by “arbitrary”
let’s agree we mean something
unstructured such as an email message
or a tweet.