How to get fully-qualified URL from anchor href?

2019-06-01 07:07发布

I am writing a web crawler in php. Given a current URL, and an array of links to absolute, relative, and root URLs, how would I determine the fully-qualified URL for each link?

For example, I let's say I am crawling the URL:

http://www.example.com/path/to/my/file.html

And the array of links that the webpage contains is:

array(
    'http://www.some-other-domain.com/',
    '../../',
    '/search',
);

How would I determine the fully-qualified URL for each of those links? The result I am looking for in this example would be, respectively:

http://www.some-other-domain.com/
http://www.example.com/path/
http://www.example.com/search/

1条回答
聊天终结者
2楼-- · 2019-06-01 07:37

I think the easiest way is to use a library like this: http://www.electrictoolbox.com/php-resolve-relative-urls-absolute/

Examples from the link:

url_to_absolute('http://www.example.com/sitemap.html', 'aboutus.html');

resolves to http://www.example.com/aboutus.html

or

url_to_absolute('http://www.example.com/content/sitemap.html', '../images/somephoto.jpg');

resolves to http://www.example.com/images/somephoto.jpg

查看更多
登录 后发表回答