How to mirror only a section of a website?

2019-01-29 17:12发布

I cannot get wget to mirror a section of a website (a folder path below root) - it only seems to work from the website homepage.

I've tried many options - here is one example

wget -rkp -l3 -np  http://somewebsite/subpath/down/here/

While I only want to mirror the content links below that URL - I also need to download all the page assets which are not in that path.

It seems to work fine for the homepage (/) but I can't get it going for any sub folders.

标签: wget mirror
3条回答
The star\"
2楼-- · 2019-01-29 17:39

Use the --mirror (-m) and --no-parent (-np) options, plus a few of cool ones, like in this example:

wget --mirror --page-requisites --adjust-extension --no-parent --convert-links
     --directory-prefix=sousers http://stackoverflow.com/users
查看更多
forever°为你锁心
3楼-- · 2019-01-29 17:39

I usually use:

wget -m -np -p $url
查看更多
Viruses.
4楼-- · 2019-01-29 17:40

I use pavuk to accomplish mirrors, as it seemed much better for this purpose just from the beginning. You can use something like this:

/usr/bin/pavuk -enable_js -fnrules F '*.php?*' '%o.php' -tr_str_str '?' '_questionmark_' \
               -norobots -dont_limit_inlines -dont_leave_dir \
               http://www.example.com/some_directory/ >OUT 2>ERR
查看更多
登录 后发表回答