I have the following site http://www.asd.com.tr. I want to download all PDF files into one directory. I've tried a couple of commands but am not having much luck.
$ wget --random-wait -r -l inf -nd -A pdf http://www.asd.com.tr/
With this code only four PDF files were downloaded. Check this link, there are over several thousand PDFs available:
For instance, hundreds of files are in the following folder:
But I can't figure out how to access them correctly to see and download them all, there are some of folders in this subdirectory, http://www.asd.com.tr/Folders/, and thousands of PDFs in these folders.
I've tried to mirror site using -m
command but it failed too.
Any more suggestions?
First, verify that the TOS of the web site permit to crawl it. Then, one solution is :
The
mech-dump
command comes with Perl's moduleWWW::Mechanize
(libwww-mechanize-perl
package on debian & debian likes distros)