Is it possible to use a Perl hash in a manner that has O(log(n))
lookup and insertion?
By default, I assume the lookup is O(n)
since it's represented by an unsorted list.
I know I could create a data structure to satisfy this (ie, a tree, etc) however, it would be nicer if it was built in and could be used as a normal hash (ie, with %)
Associative arrays in Perl 5 are implemented with hash tables which have amortized O(1) (i.e. constant time) insert and lookup. That is why we tend to call them hashes not associative arrays.
It is difficult to find documentation that states that Perl 5 uses a hash table for implementation of associative arrays (besides the fact that we call associative arrays "hashes"), but there is at least this in
perldoc perlfaq4
An even better quote from
perldoc perldata
:Of course, O(1) is only theoretical performance. In the real world we do not have perfect hashing functions, so hashes do slow down as they get larger, and there are some degenerate cases that turn a hash into O(n), but Perl does its best to prevent this from happening. Here is a benchmark of Perl hashes with 10, 100, 1,000, 10,000, 100,000 keys:
Here is the benchmark code:
A perl hash is a hash-table, so it already has O(1) insertion and lookup.
Anybody who thinks that hash insert or lookup time is O(1) on modern hardware is extraordinary naive. Measuring get of same value is plain wrong. Following results will give you much better picture what's going on.
Above result is measured on system with 5MB CPU cache. Note that performance drops significantly from 3.5M/s to 1M/s lookups. Anyway it is still very fast and for some cases you can beat even systems like RDBMS if you know what you are doing. You can measure your system using following code:
You shouldn't also forget that hash lookup time depends on key length. Simply, there is not real technology with O(1) access time. Each known real technology has O(logN) access time in the best. There are only systems which have O(1) access time because are limiting their maximal N and are degrading its performance for low N. It is how things works in real world and it is reason why someone making algorithms like Judy Array and evolution becomes worse and worse.