In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for a web site in PHP, hopefully with a library. I was unable to find anything on Google or GitHub.
I have seen this example code, but it doesn't handle things like a rel="canonical" meta tags or default ssl ports: http://w-shadow.com/blog/2008/07/05/how-to-get-redirect-url-in-php/
Facebook seems to handle this pretty well, you can see how they follow 301's and rel="canonical", etc. To see examples of the way Facebook handles it, use their Open Graph tool:
https://developers.facebook.com/tools/debug
and enter these links:
http://dlvr.it/xxb0W
https://twitter.com/#!/twitter/statuses/136946408275193856
Is there a PHP library out there that already has this pre-built, where it will check for these headers, resolve 301 redirects, parse rel="canonical", detect redirect loops and properly just grab the best resulting URL to use?
As an alternative, I am open to APIs that can be used, but would prefer something that runs on my own server.
Using Guzzle (a well known and robust HTTP client) you can do it like that:
I wrote you a little function to do it. It's simple, but it may be a starting point for you. Note: the http://dlvr.it/xxb0W url returns an invalid URL for it's Location response header.
You'll need the Altumo PHP library for it to work. It's a library that I wrote, but it's MIT license, as is this function.
See: https://github.com/homer6/altumo
Also, you'll have to wrap the function in a try/catch.
Please let me know if you'd like further modifications or help getting it going.
Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:
https://github.com/mattwright/URLResolver.php
URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:
I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.