Input:
- Base URL:
www.example.com/1/2/index.php
- Relative URL:
../../index.php
Output:
- Absolute URL:
www.example.com/index.php
It would be perfect, of it would be done using sed.
As I understand, this regex should delete one somefolder/
in for every ../
in the URL.
realpath
is a quick but slightly hacky way to do what you want.
(Actually, I'm surprised that it doesn't deal properly with URLs; it treats them as plain old filesystem paths.)
~$ realpath -m http://www.example.com/1/2/../../index.php
=>
~$ /home/username/http:/www.example.com/index.php
The -m
(for "missing") says to resolve the path even if components of it don't actually exist on the filesystem.
So you'll still have to strip off the actual filesystem part of that (which will just be $(pwd)
. And note that the slash-slash for the protocol was also canonicalized to a single slash. So you might be better off to leave the "http://" off of your input and just prepend it to your output instead.
See man 1 realpath
for the full story. Or info coreutils 'realpath invocation'
for a more verbose full story, if you have the info system installed.
Using sed
inside bash
#!/bin/bash
base_url='www.example.com/1/2/index.php'
rel_url='../../index.php'
str="${base_url};${rel_url}"
str=$(echo $str | sed -r 's#/[^/]*;#/#')
while [ ! -z $(echo $str | grep '\.\.') ]
do
str=$(echo $str | sed -r 's#\w+/\.\./##')
done
abs_url=$str
echo $abs_url
Output:
www.example.com/index.php
If your only requirement is to turn ..
into "up one level" then this is a possible solution. It doesn't use regular expressions or sed, or a JVM for that matter ;)
#!/bin/bash
domain="www.example.com"
origin="1/2/3/4/index.php"
rel="../../index.php"
awk -v rel=$rel -v origin=$origin -v file=$(basename $rel) -v dom=$domain '
BEGIN {
n = split(rel, a, "/")
for(i = 1; i <= n; ++i) {
if(a[i] == "..") ++c
}
abs = dom
m=split(origin, b, "/")
for(i = 1; i < m - c; ++i) {
abs=abs"/"b[i]
}
print abs"/"file
}'
An alternative approach to using awk
, credit to Edward for mentioning realpath -m
:
#!/bin/bash
rel="../../index.php"
origin="www.example.com/1/2/index.php"
directory=$(dirname $origin)
fullpath=$(realpath -m "$directory/$rel")
echo ${fullpath#$(pwd)/}
You can't use a single regular expression for this, because regular expressions can't count.
You should use a real programming language instead. Even Java can do this easily.