I was just wondering about the efficiency of using a one dimensional hash (i.e. only keys, no values - we don't care about them anyway) over a one dimensional array.
The main reason I wanted to use a hash for this purpose is so I can use the exists function to see if an 'entry' already exists. Hashes are also great for not duplicating keys right? With arrays, I would need to set up my own checks involving grep which I'm led to believe would be slower.
This hash/array will then be iterated through for certain operations.
I'd love to hear any insight on this, and thanks in advance!
This is absolutely a standard technique in Perl. My own scripts are full of code like this:
There are plenty of examples on Stack Overflow, e.g. here and here.
Looking up a key in a hash is approximately O(1).
In a mathematical sense, hash keys are sets and arrays are tuples. For example, the tuples
('apple', 'banana')
and('banana', 'apple')
are different entities, whereas the sets{'apple', 'banana'}
and{'banana', 'apple'}
are the same.You should use sets when you need sets and tuples when you need tuples.
If you need to perform set operations, you might want to use Set::Object, rather than writing the operations from scratch every time.
If you are going to use hash keys to represent a set, setting the values to
undef
rather than1
can reduce the memory footprint which might matter if your sets are large.is a nice, short expression, clear and easy to use. Obviously
Is not quite as short, but
is even shorter. So since 5.10, it just as syntactically easy to test the smart match as the
exists
.So guessing from the performance difference between arrays and hashes, I can imagine that the smart match will perform faster for a small list of items, but the hash will by far outperform the array lookup with a large list of items.
However, you should Benchmark the performance anyway.
And this is why. On Strawberry perl, with even a list size 1, the hash look-up outperforms the string match:
With 2 items in the list:
And with 20 items:
I would use the hash.
Yes. If you want to check whether an item exists in an array, you have to loop over every item. In a hash table, you simply lookup the key. This is faster for big datasets.