I am making a crawler application. I wish to crawl websites and find the depth of the webpages retrieved. I read about different crawling and parsing tools but to no avail. None of them seem to provide support to calculate the depth. I am also unsure about which crawler tool to use which can get closest to desired functionality. Any help is appreciated.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
The most important thing is probably the mapping of your Domain (and not the parser).
Because, if you are using a tree (More information on wikipedia), it is easy to calculate the depth (the min depth) of your URL.
Hope this helps.