Why is ThreadLocalRandom implemented so bizarrely?

2019-04-20 17:48发布

问题:

This question regards the implementation of ThreadLocalRandom in OpenJDK version 1.8.0.

ThreadLocalRandom provides a per-thread random number generator without the synchronization overhead imposed by Random. The most obvious implementation (IMO) would be something like this, which appears to preserve backward compatibility without much complexity:

public class ThreadLocalRandom extends Random {
    private static final ThreadLocal<ThreadLocalRandom> tl =
        ThreadLocal.withInitial(ThreadLocalRandom::new);
    public static ThreadLocalRandom current() {
        return tl.get();
    }
    // Random methods moved here without synchronization
    // stream methods here
}

public class Random {
    private ThreadLocalRandom delegate = new ThreadLocalRandom();
    // methods synchronize and delegate for backward compatibility
}

However, the actual implementation is totally different and quite bizarre:

  • ThreadLocalRandom duplicates some of the methods in Random verbatim and others with minor modifications; surely much of this code could have been reused.
  • Thread stores the seed and a probe variable used to initialize the `ThreadLocalRandom, violating encapsulation;
  • ThreadLocalRandom uses Unsafe to access the variables in Thread, which I suppose is because the two classes are in different packages yet the state variables must be private in Thread - Unsafe is only necessary because of the encapsulation violation;
  • ThreadLocalRandom stores its next nextGaussian in a static ThreadLocal instead of in an instance variable as Random does.

Overall my cursory inspection seems to reveal an ugly copy of Random with no advantages over the simple implementation above. But the authors of the standard library are smart so there must be some reason for this weird approach. Does anyone have any insight into why ThreadLocalRandom was implemented this way?

回答1:

The key problem is a lot of the code is legacy and can't (easily) be changed - Random was designed to be "thread-safe" by synchronizing all its methods. This works, in that instances of Random can be used across multiple threads, but it's a severe bottleneck as no two threads can simultaneously retrieve random data. A simple solution would be to construct a ThreadLocal<Random> object thereby avoiding the lock contention, however this still isn't ideal. There's still some overhead to synchronized methods even when uncontested, and constructing n Random instances is wasteful when they're all essentially doing the same job.

So at a high-level ThreadLocalRandom exists as a performance optimization, hence it makes sense that its implementation would be "bizarre", as the JDK devs have put time into optimizing it.

There are many classes in the JDK that, at first glance, are "ugly". Remember however that the JDK authors are solving a different problem than you. The code they write will be used by thousands if not millions of developers in countless ways. They have to regularly trade-off best-practices for efficiency because the code they're writing is so mission critical.

Effective Java: Item 55 also addresses this issue - the key point being that optimization should be done as a last resort, by experts. The JDK devs are those experts.

To your specific questions:

ThreadLocalRandom duplicates some of the methods in Random verbatim and others with minor modifications; surely much of this code could have been reused.

Unfortunately no, as the methods on Random are synchronized. If they were invoked ThreadLocalRandom would pull in Random's lock-contention trouble. TLR needs to override every method in order to remove the synchronized keyword from the methods.

Thread stores the seed and a probe variable used to initialize the ThreadLocalRandom, violating encapsulation;

First off, it's really not "violating encapsulation" since the field is still package-private. It's encapsulated from users, which is the goal. I wouldn't get too hung up on this as the decisions were made here to improve performance. Sometimes performance comes at the cost of normal good design. In practice this "violation" is undetectable. The behavior is simply encapsulated inside two classes instead of a single one.

Putting the seed inside Thread allows ThreadLocalRandom to be totally stateless (aside from the initialized field, which is largely irrelevant), and therefore only a single instance ever needs to exist across the whole application.

ThreadLocalRandom uses Unsafe to access the variables in Thread, which I suppose is because the two classes are in different packages yet the state variables must be private in Thread - Unsafe is only necessary because of the encapsulation violation;

Many JDK classes use Unsafe. It's a tool, not a sin. Again, I just wouldn't get too stressed out about this fact. The class is called Unsafe to discourage lay-developers from misusing it. We trust/hope the JDK authors are smart enough to know when it's safe to use.

ThreadLocalRandom stores its next nextGaussian in a static ThreadLocal instead of in an instance variable as Random does.

Since there will only ever be one instance of ThreadLocalRandom there's no need for this to be an instance variable. I suppose you could alternatively make the case that there's no need for it to be a static either, but at that point you're just debating style. At a minimum making it static more clearly leaves the class essentially stateless. As mentioned in the file, this field is not really necessary, but ensures similar behavior to Random.