Is there a pre-existing function or class for URL normalization in PHP?
Specifically, following the semantic preserving normalization rules laid out in this wikipedia article on URL normalization, (or whatever 'standard' I should be following).
- Converting the scheme and host to lower case
- Capitalizing letters in escape sequences
- Adding trailing / (to directories, not files)
- Removing the default port
- Removing dot-segments
Right now, I'm thinking that I'll just use parse_url()
, and apply the rules individually, but I'd prefer to avoid reinventing the wheel.
The Pear Net_URL2 library looks like it'll do at least part of what you want. It'll remove dot segments, fix capitalization and get rid of the default port:
include("Net/URL2.php");
$url = new Net_URL2('HTTP://example.com:80/a/../b/c');
print $url->getNormalizedURL();
emits:
http://example.com/b/c
I doubt there's a general purpose mechanism for adding trailing slashes to directories because you need a way to map urls to directories which is challenging to do in a generic way. But it's close.
References:
- http://pear.php.net/package/Net_URL2
- http://pear.php.net/package/Net_URL2/docs/latest/Net_URL2/Net_URL2.html