In .NET 4.0+, a class SortedSet<T>
has a method called GetViewBetween(l, r)
, which returns an interface view on a tree part containing all the values between the two specified. Given that SortedSet<T>
is implemented as a red-black tree, I naturally expect it to run in O(log N)
time. The similar method in C++ is std::set::lower_bound/upper_bound
, in Java it's TreeSet.headSet/tailSet
, and they are logarithmic.
However, that is not true. The following code runs in 32 sec, whereas the equivalent O(log N)
version of GetViewBetween
would make this code run in 1-2 sec.
var s = new SortedSet<int>();
int n = 100000;
var rand = new Random(1000000007);
int sum = 0;
for (int i = 0; i < n; ++i) {
s.Add(rand.Next());
if (rand.Next() % 2 == 0) {
int l = rand.Next(int.MaxValue / 2 - 10);
int r = l + rand.Next(int.MaxValue / 2 - 10);
var t = s.GetViewBetween(l, r);
sum += t.Min;
}
}
Console.WriteLine(sum);
I decompiled System.dll using dotPeek and here's what I got:
public TreeSubSet(SortedSet<T> Underlying, T Min, T Max, bool lowerBoundActive, bool upperBoundActive)
: base(Underlying.Comparer)
{
this.underlying = Underlying;
this.min = Min;
this.max = Max;
this.lBoundActive = lowerBoundActive;
this.uBoundActive = upperBoundActive;
this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
this.count = 0;
this.version = -1;
this.VersionCheckImpl();
}
internal SortedSet<T>.Node FindRange(T from, T to, bool lowerBoundActive, bool upperBoundActive)
{
SortedSet<T>.Node node = this.root;
while (node != null)
{
if (lowerBoundActive && this.comparer.Compare(from, node.Item) > 0)
{
node = node.Right;
}
else
{
if (!upperBoundActive || this.comparer.Compare(to, node.Item) >= 0)
return node;
node = node.Left;
}
}
return (SortedSet<T>.Node) null;
}
private void VersionCheckImpl()
{
if (this.version == this.underlying.version)
return;
this.root = this.underlying.FindRange(this.min, this.max, this.lBoundActive, this.uBoundActive);
this.version = this.underlying.version;
this.count = 0;
base.InOrderTreeWalk((TreeWalkPredicate<T>) (n =>
{
SortedSet<T>.TreeSubSet temp_31 = this;
int temp_34 = temp_31.count + 1;
temp_31.count = temp_34;
return true;
}));
}
So, FindRange
is obviously O(log N)
, but after that we call VersionCheckImpl
... which does a linear-time traversal of the found subtree only for recounting its nodes!
- Why would you need to do that traversal all the time?
- Why .NET does not contain a
O(log N)
method for splitting a tree based on key, like C++ or Java? It is really helpful in lots of situations.
about the
version
fieldUPDATE1:
In my memory, a lot of(maybe all?) collections in BCL have the field
version
.First,about
foreach
:according to this msdn link
In many other collections,
version
is protected the data is not modified during theforeach
For example,
HashTable
'sMoveNext()
:But in the in the
SortedSet<T>
'sMoveNext()
method:UPDATE2:
But the O(N) loop maybe not only for
version
but also for theCount
property.Because the MSDN of GetViewBetween said:
So for every update it should be sync the
count
field (key and value are already same). To make sure theCount
is correctThere were two policies to reach the target:
First.MS's,in their code, they sacrifice the
GetViewBetween()
's performance and win theCount
Property's performance.VersionCheckImpl()
is one way to sync theCount
property.Second,Mono. In mono's code,
GetViewBetween()
is Faster, but in theirGetCount()
method:It is always an O(N) operation!