I am writing a web crawler in php. Given a current URL, and an array of links to absolute, relative, and root URLs, how would I determine the fully-qualified URL for each link?
For example, I let's say I am crawling the URL:
http://www.example.com/path/to/my/file.html
And the array of links that the webpage contains is:
array(
'http://www.some-other-domain.com/',
'../../',
'/search',
);
How would I determine the fully-qualified URL for each of those links? The result I am looking for in this example would be, respectively:
http://www.some-other-domain.com/
http://www.example.com/path/
http://www.example.com/search/
I think the easiest way is to use a library like this: http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/
Examples from the link:
resolves to
http://www.example.com/aboutus.html
or
resolves to
http://www.example.com/images/somephoto.jpg