So I am thinking about building a hobby project, one off kind of thing, just to brush up on my programming/design.
It's basically a multi threaded web spider, updating the same data structure object->int.
So it is definitely overkill to use a database for this, and the only thing I could think of is a thread-safe singleton used to contain my data structure. http://web.archive.org/web/20121106190537/http://www.ibm.com/developerworks/java/library/j-dcl/index.html
Is there a different approach I should look in to?
The article you referenced only talks about making the creation of the singleton object, presumably a collection in this case, thread-safe. You also need a thread-safe collection so that the collection operations also work as expected. Make sure that the underlying collection in the singleton is synchronized, perhaps using a ConcurrentHashMap.
Why don't you create a data structure you pass to each of the threads as dependency injection. That way you don't need a singleton. You still need to make the thread safe.
as Joshua Bloch argues in his book "effective java 2nd edition" I also agree that a single element enum type is the best way to implement a singleton.
If you look at the very bottom of that article, you'll see the suggestion to just use a static field. That would be my inclination: you don't really need lazy instantiation (so you don't need
getInstance()
to be both an accessor and a factory method). You just want to ensure that you have one and only one of these things. If you really need global access to one such thing, I'd use that code sample towards the very bottom:Note that the Singleton is now constructed during the installation of static fields. This should work and not face the threading risks of potentially mis-synchronizing things.
All that said, perhaps what you really need is one of the thread-safe data structures available in the modern JDKs. For example, I'm a big fan of the ConcurrentHashMap: thread safety plus I don't have to write the code (FTW!).
Using lazy initialization for the database in a web crawler is probably not worthwhile. Lazy initialization adds complexity and an ongoing speed hit. One case where it is justified is when there is a good chance the data will never be needed. Also, in an interactive application, it can be used to reduce startup time and give the illusion of speed.
For a non-interactive application like a web-crawler, which will surely need its database to exist right away, lazy initialization is a poor fit.
On the other hand, a web-crawler is easily parallelizable, and will benefit greatly from being multi-threaded. Using it as an exercise to master the
java.util.concurrent
library would be extremely worthwhile. Specifically, look atConcurrentHashMap
andConcurrentSkipListMap
, which will allow multiple threads to read and update a shared map.When you get rid of lazy initialization, the simplest Singleton pattern is something like this:
The keyword
final
is the key here. Even if you provide astatic
"getter" for the singleton rather than allowing direct field access, making the singletonfinal
helps to ensure correctness and allows more aggressive optimization by the JIT compiler.Double-checked locking has been proven to be incorrect and flawed (as least in Java). Do a search or look at Wikipedia's entry for the exact reason.
First and foremost is program correctness. If your code is not thread-safe (in a multi-threaded environment) then it's broken. Correctness comes first before performance optimization.
To be correct you'll have to synchronize the whole
getInstance
methodor statically initialize it