I have some files in a directory tree which is being served over HTTP. Given some sub-directory A, in that directory tree I want to be able to download directory A and all containing subdirectories and files.
It seems likely that a simple/direct/atomic solution exists in the some dark corner of Java. Does anyone know how to do this?
A webcrawler will not solve my problem since files in sub-directories may link to directories that are not subdirectories.
==Update==
The directories and files must be hosted in static manner.
The server is statically hosting files in a directory tree, the client is running Java and attempting to copy some branch of the directory tree using HTTP.
VFS is the answer to this, unfortunately I answered the question myself and so can't choose it as the answer until two days from now. If someone would write up of my answer I would be happy to mark their write up as the answer.
==Further Update==
VFS is in fact not the answer. VFS will not list directories over HTTP, as stated here. There does seem to be a few people that are interested in that functionality.
If I am not terribly mistaken, HTTP does not tell you anything about the "structure" of the server side - if such a thing even exists.
Think about REST where the URI does not really tell you where to find a file on the server, but could merely trigger some action, retrieve data or the like.
So I do not think what you are trying to achieve can be done reliably, be it with Java or any other language. Or maybe I am getting you wrong here?
Assuming you have control over both the server and client, I would write a page (in your favorite technology of your choice; ASP, JSP, PHP, etc) that reads the server directory structure, and dynamically returns a page that consists of a bunch of links to each file to be downloaded.
Then client side you can trigger a download of each link.
What is the client side technology? is the thing doing the downloading an application of some sort, or a web browser? Does it have to have a client interface?
If this is some sort of in-house utility program, maybe you can just FTP instead? Having FTP access open on a server and downloading a directory would be easy...
Adding another possible answer:
If the server does not have directory listings turned on, then you basically have to make a modification server side. The easiest thing would be to just make a page that returns the dir structure to the client in a known format (see my 1st answer above).
If you control the server and have directory listings on, and you are always using the same server program (IIS, Tomcat, JBoss, etc) then you might be able to just make the client webcrawl the directory listings. For example, in a directory listing from IIS, you can tell which links are directories and which are files because it always puts a '/' at the end of a directory link, and shows 'dir' instead of a file size:
You can tell here that the 1st link is a directory, and the 2nd is an actual file.
So if you are using a consistent server app, just take a look at how the directory listing is returned. Maybe you'll get lucky.
Talk about low-hanging fruit ;-) Thanks for the offer, e5!
Commons VFS provides a single API for accessing various different file systems. It presents a uniform view of the files from various different sources, such as the files on local disk, on an HTTP server, or inside a Zip archive.
http://commons.apache.org/vfs/
I don't know of an atomic solution, but the most straightforward one would be using a URLConnection to fetch the sub-directory (assuming the server lists the directory) and then parse the response, look for contents of that directory and use URLConnection again to fetch each of the files under it.
Based on these answers, now I am wondering if you meant the Java to be on the client side or server side!
So you want from the client side on retrieve a list of all files and directores for the particular URL of the server side as if it is a local disk file system folder? That's usually not possible when the server doesn't have directory indexing enabled. And even then, you still need to parse the HTML page which represents the directory index and parse all
<a>
elements representing the files and folders yourself. There's no normaljava.io.File
approach for this. That would have been a huge security hole. One would for example be able to download all source files from http://gmail.com. HTTP is not meant as a file transfer protocol. Use FTP. That's where it stands for.For the first time in a while google beat stackoverflow, Apache commons VFS does exactly what I need.
==Update==
As stated in the question VFS only pretends to solve this problem, since it doesn't allow the listing of http directories.