I need to develop an iterator over a filesystem subtree in Java. The state of the filesystem might change while the iteration is still in progress (e.g. new folders and files get created and deleted). The iterator should therefore first capture a snapshot of the hierarchy (e.g. crawl the tree and save the names of all files found to a list) and then iterate over the snapshot.
I am wondering if it is a good idea or not to put the code to create the cache into the iterator's constructor. An alternative would be to designate a speciall method for that (named init
).
The size and depth of the iterated subtree might get quite large and the caching will therefore be time consuming. Moreover, it might throw IOExceptions (I am still not sure if it is good design practice to throw exceptions from constructors in Java).
On the other hand, creating a dedicated method to initialize the iterator would mean the client code could not use the iterator as simply an implementation of the Iterator interface.
The client code would also be responsible for calling the init method prior to the traversal. I could have the hasNext
/next
methods first make sure that the iterator has been initialized and if not, call the init
method from within them. But that would mean the first call to these methods would be significantly slower than the next ones without any reasons visible from the client side.
Creating the cache in the constructor should be fine. About the time consuming part of it, you need to decide based on how you are going to use the iterators. If the clients can't iterate until the cache is complete, it doesn't matter if it's the constructor or the init method that takes time, it's a sync blocking operation.
If you can start iterating before the cache has finished, you can start a thread that does the caching, but you'll need to override hasNext() to take this into account, and it will be either hasNext() or next() who is left waiting.
As you said in the comments, I would go for separating the responsibilities in two classes: one for taking the snapshot of the file system (e.g
FileSystemSnapshot
) and one for iterating it. Depending on the flexibility you need, you can create theFileSystemSnapshot
instance in the constructor of the iterator or pass it as a constructor argument. Going in the first direction gives the client more flexibility to configure the iterator and can be valuable if you are planning to have, for example, different strategies for taking file system snapshots. It is also better for unit testing, since it is easy to create mocks or stubs. However, you are forcing the client to know about the traversal details (i.e. that the file system has to be cached before traversing it). Using the second approach hides this implementation details from the client, but is less flexible and a little bit trickier to test (here you could define acreateFileSystemSnapshot()
method and then mock that method to return a different instance for your tests). You may also want to check the dependency injection pattern.HTH