Why does using different ArrayList constructors ca

2020-05-26 16:55发布

问题:

I seem to stumble across something interesting in ArrayList implementation that I can't wrap my head around. Here is some code that shows what I mean:

public class Sandbox {

    private static final VarHandle VAR_HANDLE_ARRAY_LIST;

    static {
        try {
            Lookup lookupArrayList = MethodHandles.privateLookupIn(ArrayList.class, MethodHandles.lookup());
            VAR_HANDLE_ARRAY_LIST = lookupArrayList.findVarHandle(ArrayList.class, "elementData", Object[].class);
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException();
        }
    }

    public static void main(String[] args) {

        List<String> defaultConstructorList = new ArrayList<>();
        defaultConstructorList.add("one");

        Object[] elementData = (Object[]) VAR_HANDLE_ARRAY_LIST.get(defaultConstructorList);
        System.out.println(elementData.length);

        List<String> zeroConstructorList = new ArrayList<>(0);
        zeroConstructorList.add("one");

        elementData = (Object[]) VAR_HANDLE_ARRAY_LIST.get(zeroConstructorList);
        System.out.println(elementData.length);

    }
}

The idea is if you create an ArrayList like this:

List<String> defaultConstructorList = new ArrayList<>();
defaultConstructorList.add("one");

And look inside what the elementData (Object[] where all elements are kept) the it will report 10. Thus you add one element - you get 9 additional slots that are un-used.

If, on the other hand, you do:

List<String> zeroConstructorList = new ArrayList<>(0);
zeroConstructorList.add("one");

you add one element, space reserved is just for that element, nothing more.

Internally this is achieved via two fields:

/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

/**
 * Shared empty array instance used for default sized empty instances. We
 * distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
 * first element is added.
 */
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

When you create an ArrayList via new ArrayList(0) - EMPTY_ELEMENTDATA will be used.

When you create an ArrayList via new Arraylist() - DEFAULTCAPACITY_EMPTY_ELEMENTDATA is used.

The intuitive part from inside me - simply screams "remove DEFAULTCAPACITY_EMPTY_ELEMENTDATA" and let all the cases be handled with EMPTY_ELEMENTDATA; of course the code comment:

We distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added

does make sense, but why would one inflate to 10 (a lot more than I asked for) and the other one to 1 (exactly as much as I requested).


Even if you use List<String> zeroConstructorList = new ArrayList<>(0), and keep adding elements, eventually you will get to a point where elementData is bigger than the one requested:

    List<String> zeroConstructorList = new ArrayList<>(0);
    zeroConstructorList.add("one");
    zeroConstructorList.add("two");
    zeroConstructorList.add("three");
    zeroConstructorList.add("four");
    zeroConstructorList.add("five"); // elementData will report 6, though there are 5 elements only

But the rate at which it grows is smaller than the case of default constructor.


This reminds me about HashMap implementation, where the number of buckets is almost always more than you asked for; but there that is done because of the need for "power of two" buckets needed, not the case here though.

So the question is - can someone explain this difference to me?

回答1:

You get precisely what you asked for, respective what has been specified, even in older versions, where the implementation was different:

ArrayList()

Constructs an empty list with an initial capacity of ten.

ArrayList(int)

Constructs an empty list with the specified initial capacity.

So, constructing the ArrayList with the default constructor will give you an ArrayList with an initial capacity of ten, so as long as the list size is ten or smaller, no resize operation will ever be needed.

In contrast, the constructor with the int argument will precisely use the specified capacity, subject to the growing policy which is specified as

The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

which applies even when you specify an initial capacity of zero.

Java 8 added the optimization that the creation of the ten elements array is postponed until the first element is added. This is specifically addressing the common case that ArrayList instances (created with the default capacity) stay empty for a long time or even their entire lifetime. Further, when the first actual operation is addAll, it might skip the first array resize operation. This does not affect lists with an explicit initial capacity, as those are usually chosen carefully.

As stated in this answer:

According to our performance analysis team, approximately 85% of ArrayList instances are created at default size so this optimization will be valid for an overwhelming majority of cases.

The motivation was to optimize precisely these scenarios, not to touch the specified default capacity, which was defined back when ArrayList was created. (Though JDK 1.4 is the first one specifying it explicitly)



回答2:

If you use the default constructor, the idea is to try to balance memory usage and reallocation. Hence a small default size (10) is used that should be fine for most applications.

If you use the constructor with an explicit size, it is assumed that you know what you're doing. If you initialize it with 0 you are essentially saying: I am pretty sure this will either stay empty or not grow beyond very few elements.

Now if you look at the implementations of ensureCapacityInternal in openjdk (link), you can see that only the first time you add an item, this difference comes into play:

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }

    ensureExplicitCapacity(minCapacity);
}

If the default constructor is used, the size grows to DEFAULT_CAPACITY (10). This is to prevent too many reallocations if multiple elements are added. However if you explicitly created this ArrayList with size 0, it will simply grow to size 1 on the first element you add. This is because you told it that you know what you're doing.

ensureExplicitCapacity basically just calls grow (with some range/overflow checks), so let's look at that:

private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

As you can see, it doesn't simply grow to a specific size, but it tries to be smart. The bigger the array is, the bigger it will grow even if minCapacity is just 1 bigger than the current capacity. The reasoning behind that is simple: The probability that a lof of items will be added is higher if the list is already big and vice versa. This is also why you see growth increments by 1 and then by 2 after the 5th element.



回答3:

The short answer to your question is what is in the Java doc: We have two constants because we now need to be able to distinguish the two different initializations later, see below.

Instead of two constants they could of course have introduced e.g. a boolean field in ArrayList, private boolean initializedWithDefaultCapacity; but that would require additional memory per instance, which seems to be against the goal to save a few bytes of memory.

Why do we need to distinguish those two?

Looking at ensureCapacity() we see what happens with DEFAULTCAPACITY_EMPTY_ELEMENTDATA:

public void ensureCapacity(int minCapacity) {
    int minExpand = (elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA)
        // any size if not default element table
        ? 0
        // larger than default for default empty table. It's already
        // supposed to be at default size.
        : DEFAULT_CAPACITY;

    if (minCapacity > minExpand) {
        ensureExplicitCapacity(minCapacity);
    }
}

It seems that it is done this way to be somewhat 'compatible' to the behavior of the old implementation:

If you did initialize the list with the default capacity, it will actually be initialized with an empty array now, but, as soon as the first element is inserted, it will basically revert to the same behavior as the old implementation, i.e. after the first element is added, the backing array has the DEFAULT_CAPACITY and from then on, the list behaves the same as previously.

If, on the other hand, you explicitly specifiy an inital capacity, the array does not 'jump' to DEFAULT_CAPACITY but grows relatively from your specified initial capacity.

I figure the reason for this 'optimization' may be for cases where you know you will be only storing one or two (i.e. less than DEFAULT_CAPACITY) elements in the list and you specify the initial capacity accordingly; in these cases, for example for a single-element list, you only get a single-element array, instead of a DEFAULT_CAPACITY-sized.

Don't ask me what the practical benefit is of saving nine array elements of a reference type. Might be up to about 9*64 bit = 72 bytes of RAM per list. Yeay. ;-)



回答4:

This is most likely due to the case that the two constructor have different perceived default uses.

The default (empty) constructor assumes that this will be a "typical ArrayList". Therefore, the number 10 is chosen as a sort of heuristic, aka "what the typical average number of elements inserted will be that will not take up too much space but will not grow the array needlessly too". On the other hand, the capacity constructor has the presupposition of "you know what you're doing" or "you know what you will be using the ArrayList for". Therefore, no heuristics of this type are present.



回答5:

but why would one inflate to 10 (a lot more than I asked for) and the other one to 1 (exactly as much as I requested)

Probably because most people that create lists want to store more than 1 element in it.

You know, when you want exactly one entry, why not use Collections.singletonList() for example.

In other words, I think the answer is pragmatism. When you use the default constructor, the typical use case would be that you are going to add maybe a handful or so of elements quickly.

Meaning: "unknown" is interpreted as "a few", whereas "exactly 0 (or 1)" is interpreted "hmm, exactly 0 or 1".



回答6:

The capacity with the default constructor is 10 simply because the docs say so. It would have been chosen as a sensible compromise between not using up too much memory off the bat, and not having to perform lots of array copies when adding the first few elements.

The zero behaviour is slightly speculative, but I'm fairly confident with my reasoning here:

It's because if you explicitly initialise an ArrayList with a size of zero, then add something to it, you're saying "I'm not expecting this list to hold much, if anything at all." It therefore makes much, much more sense to grow the backing array slowly, as though it was initialised with a value of 1, rather than treating it as if it had no initial value specified at all. So it handles the special case of growing it to just 1 element, and then carries on as normal.

To then complete the picture, an ArrayList explicitly initialised with a size of 1 would be expected to grow much more slowly (up to the point it hits the default "10 element" size) than the default one, otherwise there'd be no reason to initialise it with a small value in the first place.