I was just wondering about the efficiency of using a one dimensional hash (i.e. only keys, no values - we don't care about them anyway) over a one dimensional array.
The main reason I wanted to use a hash for this purpose is so I can use the exists function to see if an 'entry' already exists. Hashes are also great for not duplicating keys right? With arrays, I would need to set up my own checks involving grep which I'm led to believe would be slower.
This hash/array will then be iterated through for certain operations.
I'd love to hear any insight on this, and thanks in advance!
exists $hash{ $key }
is a nice, short expression, clear and easy to use. Obviously
!!grep { $_ eq $key } @array
Is not quite as short, but
$key ~~ @array # smart match
is even shorter. So since 5.10, it just as syntactically easy to test the smart match as the exists
.
So guessing from the performance difference between arrays and hashes, I can imagine that the smart match will perform faster for a small list of items, but the hash will by far outperform the array lookup with a large list of items.
However, you should Benchmark the performance anyway.
And this is why. On Strawberry perl, with even a list size 1, the hash look-up outperforms the string match:
array_lookup 577701/s -- -46%
hash_lookup 1068376/s 85% --
With 2 items in the list:
array_lookup 464684/s -- -57%
hash_lookup 1068376/s 130% --
And with 20 items:
array_lookup 181554/s -- -83%
hash_lookup 1068376/s 488% --
I would use the hash.
Yes. If you want to check whether an item exists in an array, you have to loop over every item. In a hash table, you simply lookup the key. This is faster for big datasets.
In a mathematical sense, hash keys are sets and arrays are tuples. For example, the tuples ('apple', 'banana')
and ('banana', 'apple')
are different entities, whereas the sets {'apple', 'banana'}
and {'banana', 'apple'}
are the same.
You should use sets when you need sets and tuples when you need tuples.
If you need to perform set operations, you might want to use Set::Object, rather than writing the operations from scratch every time.
If you are going to use hash keys to represent a set, setting the values to undef
rather than 1
can reduce the memory footprint which might matter if your sets are large.
This is absolutely a standard technique in Perl. My own scripts are full of code like this:
my @elements = (...);
my %is_element = map { $_ => 1 } @elements;
There are plenty of examples on Stack Overflow, e.g. here and here.
Looking up a key in a hash is approximately O(1).