There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via wget
. But, the problem is that when wget
downloads sub-directories it downloads the index.html
file which contains the list of files in that directory without downloading the files themselves.
Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).
Solution:
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
- It will download all files and subfolders in ddd directory
-r
: recursively
-np
: not going to upper directories, like ccc/…
-nH
: not saving files to hostname folder
--cut-dirs=3
: but saving it to ddd by omitting
first 3 folders aaa, bbb, ccc
-R index.html
: excluding index.html
files
Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/
I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive
flag (see image).
Also found that the -no-parent
flag is important, othewise it will try to download everything.
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
From man wget
‘-r’
‘--recursive’
Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
‘-np’
‘--no-parent’
Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
‘-nH’
‘--no-host-directories’
Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
‘--cut-dirs=number’
Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.
No options -> ftp.xemacs.org/pub/xemacs/
-nH -> pub/xemacs/
-nH --cut-dirs=1 -> xemacs/
-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
...
If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.
wget
is an invaluable resource and something I use myself. However sometimes there are characters in the address that wget
identifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wget
I thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.
There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:
"Download Master" is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.
https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce
For an up-to-date feature list and other information, visit the project page on the developer's blog:
http://monadownloadmaster.blogspot.com/
No Plugins required!
Use bookmarklet. Drag this link in bookmarks, then edit and paste this code:
(function(){ var arr=[], l=document.links; var ext=prompt("select extension for download (all links containing that, will be downloaded.", ".mp3"); for(var i=0; i<l.length; i++) { if(l[i].href.indexOf(ext) !== false){ l[i].setAttribute("download","download"); l[i].click(); } } })();
and go on page (from where you want to download files), and click that bookmarklet.