What is the meaning of a ^ sign in a URL?
I needed to crawl some link data from a webpage and I was using a simple handwritten PHP crawler for it. The crawler usually works fine; then I came to a URL like this:
http://www.example.com/example.asp?x7=3^^^^^select%20col1,col2%20from%20table%20where%20recordid%3E=20^^^^^
This URL works fine when typed in a browser but my crawler is not able to retrieve this page. I am getting an "HTTP request failed error".
Caret (^) is not a reserved character in URLs, so it should be acceptable to use as-is. However, if you re having problems, just replace it with its hex encoding
%5E
.And yeah, putting raw SQL in the URL is like a big flashing neon sign reading "EXPLOIT ME PLEASE!".
^
characters should be encoded, see RFC 1738 Uniform Resource Locators (URL):You could try URL encoding the
^
character.Caret is neither reserved nor "unreserved", which makes it an "unsafe character" in URLs. They should never appear in URLs unencoded. From RFC2396:
Based on the context, I'd guess they're a homespun attempt to URL-encode quote-marks.
The crawler may be using regular expressions to parse the URL and therefore is falling over because the caret (^) means beginning of line. I'm thinking these URLs are really bad practice since they are exposing the underlying database structure; whomever wrote this might want to consider serious refactoring!
HTH!