Resolve a relative path in a URL with PHP

2019-02-18 01:08发布

Example 1: domain.com/dir_1/dir_2/dir_3/./../../../
Should resolve naturally in the browser into = domain.com/

Example 2: domain.com/dir_1/dir_2/dir_3/./../../../test/././../new_dir/
Should resolve into domain.com/new_dir/

Example 3: domain.com/dir_1/dir_2/dir_3/./../../../test/dir4/../final
Should resolve into domain.com/test/final

The question is... how can i iterate through the string to do this? I feel like the for() loop would get confused at this point..

Transfrom relative path into absolute URL using PHP
and
PHP: How to resolve a relative url

doesn't work for me in this case.. I shouldn't need a reference point (base), since the objective to clean up what I already have..

2条回答
forever°为你锁心
2楼-- · 2019-02-18 01:34

What you want here is a "replaceDots" function.

It works by remembering the position of the last valid item and then if you get dots then removing the item. The full description is here "Remove Dot Segments" http://tools.ietf.org/html/rfc3986. Search for Remove Dot Segments at the RFC page.

You need more than one loop. The inner loop scans ahead and looks at the next part and then if it is dots the current part is skipped etc, but it can be trickier than that. Or consider breaking it up into parts and then following the algorithm.

  1. While the input buffer is not empty, loop as follows:

    A. If the input buffer begins with a prefix of "../" or "./", then remove that prefix from the input buffer; otherwise,

    B. if the input buffer begins with a prefix of "/./" or "/.", where "." is a complete path segment, then replace that prefix with "/" in the input buffer; otherwise,

    C. if the input buffer begins with a prefix of "/../" or "/..", where ".." is a complete path segment, then replace that prefix with "/" in the input buffer and remove the last segment and its preceding "/" (if any) from the output buffer; otherwise,

    D. if the input buffer consists only of "." or "..", then remove that from the input buffer; otherwise,

    E. move the first path segment in the input buffer to the end of the output buffer, including the initial "/" character (if any) and any subsequent characters up to, but not including, the next "/" character or the end of the input buffer.

    1. Finally, the output buffer is returned as the result of remove_dot_segments. function.

It works by remembering the position of the last valid item and then if you get dots then removing the item. The full description is here

HERE IS MY VERSION OF IT IN C++...

ortl_funcimp(len_t) _str_remove_dots(char_t* s, len_t len) {
  len_t x,yy;
  /*
    Modifies the string in place by copying parts back. Not
    sure if this is the best way to do it since it involves
    many copies for deep relatives like ../../../../../myFile.cpp

    For each ../ it does one copy back. If the loop was implemented
    using writing into a buffer, you would have to do both, so this
    seems to be the best technique.
  */
  __checklenx(s,len);
  x = 0;
  while (x < len) {
    if (s[x] == _c('.')) {
      x++;
      if (x < len) {
        if (s[x] == _c('.')) {
          x++;
          if (x < len) {
            if (s[x] == _c('/')) { // ../
              mem_move(&s[x],&s[x-2],(len-x)*sizeof(char_t));
              len -= 2;
              x -= 2;
            }
            else x++;
          }
          else len -= 2;// .. only
        }
        else if (s[x] == _c('/')){ // ./
          mem_move(&s[x],&s[x-1],(len-x)*sizeof(char_t));
          len--;
          x--;
        }
      }
      else --len;// terminating '.', remove
    }
    else if (s[x] == _c('/')) {
      x++;
      if (x < len) {
        if (s[x] == _c('.')) {
          x++;
          if (x < len) {
            if (s[x] == _c('/')) { // /./
              mem_move(&s[x],&s[x-2],(len-x)*sizeof(char_t));
              len -= 2;
              x -= 2;
            }
            else if (s[x] == _c('.')) { // /..
              x++;
              if (x < len) { //
                if (s[x] == _c('/')) {// /../
                  yy = x;
                  x -= 3;
                  if (x > 0) x--;
                  while ((x > 0) && (s[x] != _c('/'))) x--;
                  mem_move(&s[yy],&s[x],(len-yy) * sizeof(char_t));
                  len -= (yy - x);
                }
                else {
                  x++;
                }
              }
              else {// ends with /..
                x -= 3;
                if (x > 0) x--;
                while (x > 0 && s[x] != _c('/')) x--;
                s[x] = _c('/');
                x++;
                len = x;
              }
            }
            else x++;
          }
          else len--;// ends with /.
        }
        else x++;
      }
    }
    else x++;
  }
  return len;
}
查看更多
劳资没心,怎么记你
3楼-- · 2019-02-18 01:39

This is a more simple problem then you are thinking about it. All you need to do is explode() on the / character, and parse out all of the individual segments using a stack. As you traverse the array from left to right, if you see ., do nothing. If you see .., pop an element from the stack. Otherwise, push an element onto the stack.

$str = 'domain.com/dir_1/dir_2/dir_3/./../../../';
$array = explode( '/', $str);
$domain = array_shift( $array);

$parents = array();
foreach( $array as $dir) {
    switch( $dir) {
        case '.':
        // Don't need to do anything here
        break;
        case '..':
            array_pop( $parents);
        break;
        default:
            $parents[] = $dir;
        break;
    }
}

echo $domain . '/' . implode( '/', $parents);

This will properly resolve the URLs in all of your test cases.

Note that error checking is left as an exercise to the user (i.e. when the $parents stack is empty and you try to pop something off of it).

查看更多
登录 后发表回答