Wget: Skip download if file already exists?

2019-04-07 21:31发布

问题:

Answers to Skip download if files exist in wget? say to use -nc, or --no-clobber, but -nc doesn't prevent the sending of the HTTP request and subsequent downloading of the file. It just doesn't do anything after downloading the file if the file has already been fully retrieved. Is there anyway to prevent making the HTTP request if the file already exists?

I installed wget 1.16.3 with Homebrew. After running the command below, wget said something like making HTTP request for each file that already existed, appeared to download it, and then said something like: file already retrieved, nothing to do.

wget --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12' \
     --tries=1 \
     --no-clobber \
     --continue \
     --wait=0.3 \
     --random-wait \
     --adjust-extension \
     --load-cookies cookies.txt \
     --save-cookies cookies.txt \
     --keep-session-cookies \
         --recursive \
         --level=inf \
         --convert-links \
         --page-requisites \
         --reject=edit,logout,rate \
         --domains=example.com,s3.amazonaws.com \
         --span-hosts \
         --exclude-directories=/admin \
     http://example.com/

回答1:

The -nc option does what you're asking for, at least in wget 1.19.1.


On my server, I have a file called index.html which contains links to a.html and b.html.

$ wget -r -nc http://127.0.0.1:8000/

Server logs show this:

127.0.0.1 - - [23/Mar/2017 17:51:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /a.html HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /b.html HTTP/1.1" 200 -

Now I remove b.html and run it again:

$ rm 127.0.0.1\:8000/b.html
$ wget -r -nc http://127.0.0.1:8000/

Server logs show this:

127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /b.html HTTP/1.1" 200 -

As you can see, only a request for b.html was made.



回答2:

It appears you are using incompatible options, I get the following warning on wget 1.16 linux:

$ wget --no-clobber --convert-links http://example.com
Both --no-clobber and --convert-links were specified, only --convert-links will be used.


标签: wget