Wget: Skip download if file already exists?

Answers to Skip download if files exist in wget? say to use -nc, or --no-clobber, but -nc doesn't prevent the sending of the HTTP request and subsequent downloading of the file. It just doesn't do anything after downloading the file if the file has already been fully retrieved. Is there anyway to prevent making the HTTP request if the file already exists?

I installed wget 1.16.3 with Homebrew. After running the command below, wget said something like making HTTP request for each file that already existed, appeared to download it, and then said something like: file already retrieved, nothing to do.

wget --user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12' \
     --tries=1 \
     --no-clobber \
     --continue \
     --wait=0.3 \
     --random-wait \
     --adjust-extension \
     --load-cookies cookies.txt \
     --save-cookies cookies.txt \
     --keep-session-cookies \
         --recursive \
         --level=inf \
         --convert-links \
         --page-requisites \
         --reject=edit,logout,rate \
         --domains=example.com,s3.amazonaws.com \
         --span-hosts \
         --exclude-directories=/admin \
     http://example.com/

标签： wget

2条回答

疯言疯语

2楼-- · 2019-04-07 22:25

It appears you are using incompatible options, I get the following warning on wget 1.16 linux:

$ wget --no-clobber --convert-links http://example.com
Both --no-clobber and --convert-links were specified, only --convert-links will be used.

0人赞添加讨论(0) 举报

等我变得足够好

3楼-- · 2019-04-07 22:36

The -nc option does what you're asking for, at least in wget 1.19.1.

On my server, I have a file called index.html which contains links to a.html and b.html.

$ wget -r -nc http://127.0.0.1:8000/

Server logs show this:

127.0.0.1 - - [23/Mar/2017 17:51:25] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /a.html HTTP/1.1" 200 -
127.0.0.1 - - [23/Mar/2017 17:51:25] "GET /b.html HTTP/1.1" 200 -

Now I remove b.html and run it again:

$ rm 127.0.0.1\:8000/b.html
$ wget -r -nc http://127.0.0.1:8000/

Server logs show this:

127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /robots.txt HTTP/1.1" 404 -
127.0.0.1 - - [23/Mar/2017 17:51:38] "GET /b.html HTTP/1.1" 200 -

As you can see, only a request for b.html was made.

0人赞添加讨论(0) 举报

Wget: Skip download if file already exists?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间