Advantage of 'one dimensional' hash over a

2019-07-01 17:15发布

问题:

I was just wondering about the efficiency of using a one dimensional hash (i.e. only keys, no values - we don't care about them anyway) over a one dimensional array.

The main reason I wanted to use a hash for this purpose is so I can use the exists function to see if an 'entry' already exists. Hashes are also great for not duplicating keys right? With arrays, I would need to set up my own checks involving grep which I'm led to believe would be slower.

This hash/array will then be iterated through for certain operations.

I'd love to hear any insight on this, and thanks in advance!

回答1:

exists $hash{ $key } 

is a nice, short expression, clear and easy to use. Obviously

!!grep { $_ eq $key } @array

Is not quite as short, but

$key ~~ @array # smart match

is even shorter. So since 5.10, it just as syntactically easy to test the smart match as the exists.

So guessing from the performance difference between arrays and hashes, I can imagine that the smart match will perform faster for a small list of items, but the hash will by far outperform the array lookup with a large list of items.

However, you should Benchmark the performance anyway.


And this is why. On Strawberry perl, with even a list size 1, the hash look-up outperforms the string match:

array_lookup  577701/s           --         -46%
hash_lookup  1068376/s          85%           --

With 2 items in the list:

array_lookup  464684/s           --         -57%
hash_lookup  1068376/s         130%           --

And with 20 items:

array_lookup  181554/s           --         -83%
hash_lookup  1068376/s         488%           --

I would use the hash.



回答2:

Yes. If you want to check whether an item exists in an array, you have to loop over every item. In a hash table, you simply lookup the key. This is faster for big datasets.



回答3:

In a mathematical sense, hash keys are sets and arrays are tuples. For example, the tuples ('apple', 'banana') and ('banana', 'apple') are different entities, whereas the sets {'apple', 'banana'} and {'banana', 'apple'} are the same.

You should use sets when you need sets and tuples when you need tuples.

If you need to perform set operations, you might want to use Set::Object, rather than writing the operations from scratch every time.

If you are going to use hash keys to represent a set, setting the values to undef rather than 1 can reduce the memory footprint which might matter if your sets are large.



回答4:

This is absolutely a standard technique in Perl. My own scripts are full of code like this:

my @elements = (...);
my %is_element = map { $_ => 1 } @elements;

There are plenty of examples on Stack Overflow, e.g. here and here.

Looking up a key in a hash is approximately O(1).