Is there a way to safely sanitize path input, without using realpath()
?
Aim is to prevent malicious inputs like ../../../../../path/to/file
$handle = fopen($path . '/' . $filename, 'r');
Is there a way to safely sanitize path input, without using realpath()
?
Aim is to prevent malicious inputs like ../../../../../path/to/file
$handle = fopen($path . '/' . $filename, 'r');
Not sure why you wouldn't want to use
realpath
but path name sanitisation is a very simple concept, along the following lines:/
), prefix it with the current working directory and/
, making it an absolute path./
with a single one (a)././
with/
./.
if at the end./anything/../
with/
./anything/..
if at the end.The text
anything
in this case means the longest sequence of characters that aren't/
.Note that those rules should be applied continuously until such time as none of them result in a change. In other words, do all six (one pass). If the string changed, then go back and do all six again (another pass). Keep doing that until the string is the same as before the pass just executed.
Once those steps are done, you have a canonical path name that can be checked for a valid pattern. Most likely that will be anything that doesn't start with
../
(in other words, it doesn't try to move above the starting point. There may be other rules you want to apply but that's outside the scope of this question.(a) If you're working on a system that treats
//
at the start of a path as special, make sure you replace multiple/
characters at the start with two of them. This is the only place where POSIX allows (but does not mandate) special handling for multiples, in all other cases, multiple/
characters are equivalent to a single one.There is a Remove Dot Sequence algorithm described in RFC 3986 that is used to interpret and remove the special
.
and..
complete path segments from a referenced path during the process of relative URI reference resolution.You could use this algorithms for file system paths as well:
Since you only asked for sanitizing, maybe what you need is just a "fail on tricky paths" thing. If normally there wouldn't be any ../../stuff/../like/this in your path input, you only need to check this:
or just
This quick and dirty way you can block any backward moves and in most cases this is sufficient. (The second version returns a nonzero instead of true but hey, why not!... The dash is a hack for index 0 of the string.)
Side note: also remember slashes vs backslashes - I'd recommend to convert backs to simple slashes first. But that's platform dependent.
Le simple form:
Le complex form (from here):
As the above functions did not work for me the one or the other way (or have been quite lengthy), I tried my own code:
The following function canonicalizes file system paths and path components of URIs. It is faster than Gumbo's RFC implementation.
Notes
/
as this would not comply with RFC 3986...\backslash\paths
.