List with amortized constant or logarithmic insert

2019-06-25 08:37发布

问题:

Everybody knows (or must know), that it is impossible to design a list data structure that supports both O(1) insertion in the middle, and O(1) lookup.

For instance, linked list support O(1) insertion, but O(N) for lookup, while arrays support O(1) for lookup, but O(N) for insertion (possibly amortized O(1) for insertion at the beginning, the end, or both).

However, suppose you are willing to trade O(1) insertion with:

  1. Amortized O(1) insertion
  2. O(log(N)) insertion

Then what is the theoretical bound for lookup in each of these cases? Do you know existing data structures? What about memory complexity?

回答1:

These are still open problems, but the best bounds that I am aware of are from Arne Andersson's Sublogarithmic searching without multiplications, which has insertions, deletions, and lookups of O(sqrt(lg(n))). However, this comes at a cost of 2^k additional space, where k is the number of bits in the integers being stored in the data structure, hence the reason that we're still using balanced binary trees instead of Andersson's data structure. A variant of the data structure allows O(1) lookups, but then the additional space increases to n2^k where n is the number of elements in the data structure. A randomized variant doesn't use any additional space, but then the sqrt(lg(n)) insertion/deletion/lookup times become average space times instead of worst case times.



回答2:

Tree-based data structures, like a rope or finger tree, can often provide logarithmic insertion time at arbitrary positions. The tradeoff is in access time, which tends to also be logarithmic except in special cases, like the ends of a finger tree.

Dynamic arrays can provide amortized constant insertion at the ends, but insertion in the middle requires copying part of the array, and is O(N) in time, as you mention.

It's probably possible to implement a data structure which supports amortized constant middle insertion. If adding to either end, treat as a dynamic array. If inserting in the middle, keep the old array and add a new array "above" it which contains the new "middle" of the list, using the old array for data which is left or right of the middle. Access time would be logarithmic after your first middle insertion, and keeping track of what data was in which layer would quickly get complicated.

This might be the 'tiered' dynamic array mentioned in the wikipedia article, I haven't researched it further.

I suspect the reason no one really uses a data structure like that is that inserting in the middle is infrequently the case you most need to most for, and logarithmic insert (using trees) is good enough for most real world cases.