I'm storing URLs in a database, and I want to be able to know if two URLs are identical.
Generally, a trailing slash at the end doesn't change the response you'd get from a server. (ie. http://www.google.com/ is the same as http://www.google.com)
Can I always blindly remove the trailing slash from any URL, without looking at anything?
Is that safe?
What I mean by "without looking at anything" is that I'd remove the slash from:
http://www.google.com/q?xxx=something&yyy=something/
I know the web server could theoretically return completely different things if it wanted, and I know sometimes going to a URL without the slash will redirect to one with the slash. My only intention here is determining if both URLs are the same.
Is this method safe?
No it is not always safe. A web server could interpret the path part of the URL anyway it likes. You cannot tell what it will do (resolve the URI) without using a GET or HEAD on the URL.
It may be safe in the sense that you'll get the same response with or without a trailing slash (and I can't guarantee that's true), but they can definitely mean different things. Consider a URL that references a directory, or something presented by the site as a directory. Using the URL
http://www.somesite.com/directory/
...makes it clear you're asking for a directory. If you hack off the trailing slash:
http://www.somesite.com/directory
...the site's going to take this as a request for a file called "directory", and get all confused for a moment. It'll likely interpret this as a request for a directory, but the meanings are not the same, and you might not get what you expect.
See this article for more detail.
No. I've encountered situations where, depending on the settings in a .htaccess file, some directories or "clean URLs" (such as those generated by a CMS) could not be accessed without a trailing slash. It's rare and it might be a mistake on the part of the webmaster, but it can happen.
As others have noted, it's not always safe. If it will work for you, my recommendation is to store the URL's with the slashes, and strip them off when you do your comparison. You'll take a performance hit, but I'd think that's better than sending someone to the wrong web page.