I'm trying to understand this paper: Stable minimum space partitioning
in linear time.
It seems that a critical part of the claim is that
Algorithm B sorts stably a bit-array of size n in
O(nlog2n) time and constant extra space, but makes only O(n) moves.
However, the paper doesn't describe the algorithm, but only references another paper which I don't have access to. I can find several ways to do the sort within the time bounds, but I'm having trouble finding one that guarantees O(N) moves without also requiring more than constant space.
What is this Algorithm B? In other words, given
boolean Predicate(Item* a); //returns result of testing *a for some condition
is there a function B(Item* a, size_t N);
which stably sorts a using Predicate as the sort key with fewer than nlog2n calls to Predicate, and performs only O(N) writes to a?
I'm tempted to say that it isn't possible. Anytime you're computing O(n log n) amount of information but have (1) nowhere to stash it (constant space), and (2) nowhere to immediately use it (O(n) moves), there is something weird going on, possibly involving heavy use of the stack (which may not be included in the space analysis, although it should be).
It might be possible if you store temporary information inside many bits of just one integer, or something squirrelly like that. (So O(1) in practice, but O(log n) in theory.)
Radix sorting wouldn't quite do it because you'd have to call the predicate to create the digits, and if you don't memoize the transitivity of comparison then you'll call it O(n^2) times. (But to memoize it takes O(log n) amortized space per item, I think.)
QED - Proof by lack of imagination :)
Here's what I have so far. A version of cycle sort which uses a bit array to hold the result of the partition tests and calculates the destinations on the fly. It performs a stable binary partition with N compares, < N swaps, and exactly 2N bits of allocated storage.
int getDest(int i, BitArray p, int nz)
{ bool b=BitArrayGet(p,i);
int below = BitArrayCount1sBelow(p,i); //1s below
return (b)?(nz+below):i-below;
}
int BinaryCycleSort(Item* a, int n, BitArray p)
{
int i, numZeros = n-BitArrayCount1sBelow(p,n);
BitArray final = BitArrayNew(n);
for (i=0;i<numZeros;i++)
if (!BitArrayGet(final,i))
{ int dest= GetDest(i,p,numZeros);
while (dest!=i)
{ SwapItem(a+i,a+dest);
BitArraySet(final,dest);
dest = getDest(dest,p,numZeros);
}
BitArraySet(final,dest);
}
return numZeros;
}
int BinaryPartition(Item* a, int n, Predicate pPred)
{
int i;
BitArray p = BitArrayNew(n);
for (i=0;i<n;i++)
if (pPred(a+i)) BitArraySet(p,i);
return BinaryCycleSort(a,n,p);
}
using these helpers:
typedef uint32_t BitStore;
typedef BitStore* BitArray;
BitArray BitArrayNew(int N); //returns array of N bits, all cleared
void BitArraySet(BitArray ba, int i); //sets ba[i] to 1
bool BitArrayGet(BitArray ba, int i); //returns ba[i]
int BitArrayCount1sBelow(BitArray ba, int i) //counts 1s in ba[0..i)
Obviously this is not constant space. But I think this might be used as a building block to the ultimate goal. The whole array can be partitioned into N/B blocks using a fixed-size BitArray of B bits. Is there some way to re-use those same bits while performing a stable merge?