i need to extract data from url like title , description ,and any vedios images in the given url like facebook share button
like this : http://www.facebook.com/sharer.php?u=http://www.wired.com&t=Test
regards
i need to extract data from url like title , description ,and any vedios images in the given url like facebook share button
like this : http://www.facebook.com/sharer.php?u=http://www.wired.com&t=Test
regards
If the web site has support for oEmbed, that's easier and more robust than scraping HTML:
oEmbed is supported by sites like YouTube and Flickr.
Use something like cURL to get the page and then something like Simple HTML DOM to parse it and extract the elements you want.
Embed.ly has a nice api for exactly this purpose. Their api returns the site's oEmbed data if available - otherwise, it attempts to extract a summary of the page like Facebook.
While I was looking for a similar functionality, I came across a jQuery + PHP demo of the url extract feature of Facebook messages: http://www.99points.info/2010/07/facebook-like-extracting-url-data-with-jquery-ajax-php/
Instead of using an HTML DOM parser, it works with simple regular expressions. It looks for title, description and img tags. Hence, the image extraction doesn't perform well with a lot of websites, which use CSS for images. Also, Facebook looks first at its own meta tags and then at the classic description tag of HTML but it illustrates well the principe.
I am working on a project for this issue, it is not as easy as writing an html parser and expecting sites to be 'semantical'. Especially extracting videos and finding auto-play parameters are killing. You can check the project in http://www.embedify.me, which has also fb-style url preview script. As I see, embed.ly and oembed are passive parser, they need the sites to support them, so called providers, the approach is quite different than fb does.