When is it better to use an array instead of a has

2019-04-20 18:38发布

问题:

Say you have an array @a = qw/ a b c d/;

and a hash %a = ('a' => 1, 'b' => 1, 'c' => 1, 'd' => 1);

Is there any situation where creating the array version is better than creating the hash (other than when you have to iterate over all the values as in something like

for (@a){
    ....

in which case you would have to use keys %a if you went with the hash)? Because testing whether a specific value is in a hash is always more efficient than doing so in an array, correct?

回答1:

    • Arrays are indexed by numbers.
    • Hashes are keyed by strings.
    • All indexes up to the highest index exist in an array.
    • Hashes are sparsely indexed. (e.g. "a" and "c" can exist without "b".)

There are many emergent properties. Primarily,

    • Arrays can be used to store ordered lists.
    • It would be ugly an inefficient to use hashes that way.
    • It's not possible to delete an element from an array unless it's the highest indexed element.
    • You can delete from an ordered list implemented using an array, though it is inefficient to remove elements other than the first or last.
    • It's possible to delete an element from a hash, and it's efficient.


回答2:

Arrays are ordered lists of values. They can contain duplicate values.

@array = qw(a b c a);

Hashes are a mapping between a key (which must be unique) and a value (which can be duplicated). Hashes are (effectively) unordered, which means that keys come out in apparently random order rather than the order in which they are entered.

%hash = (a => 1, b => 2, c => 3);

Hashes can also be used as sets when only the key matters. Sets are unordered and contain only unique "values" (the hash's keys).

%set = (a => undef, b => undef, c => undef);

Which one to use depends on your data and algorithm. Use an array when order matters (particularly if you can't sort to derive the order) or if duplicate values are possible. Use a set (i.e. use a hash as a set) when values must be unique and don't care about order. Use a hash when uniqueness matters, order doesn't (or is easily sortable), and look-ups are based on arbitrary values rather than integers.

You can combine arrays and hashes (via references) to create arbitrarily complex data structures.

@aoa = ([1, 2, 3], [4, 5, 6]);               # array of arrays ("2D" array)
%hoh = (a => { x => 1 }, b => { x => 2 });   # hash of hashes
@aoh = ({a => 1, b => 2}, {a => 3, b => 4}); # array of hashes
%hoa = (a => [1, 2], b => [3, 4]);           # hash of arrays
...etc.


回答3:

This about using numbers as hash keys. It doesn't answer the question directly as it doesn't compare the facilities that arrays provide, but I thought it would be a good place to put the information.

Suppose a hash with ten elements is built using code like this

use strict;
use warnings;

my %hash;
my $n = 1000;
for (1 .. 10) {
  $hash{$n} = 1;
  $n *= 1000;
}

and then we query it, looking for keys that are powers of ten. Of course the easiest way to multiply an integer by ten is to add a zero, so it is fine to write

my $m = '1';

for (1 .. 100) {
  print $m, "\n" if $hash{$m};
  $m .= 0;
}

which has the output

1000
1000000
1000000000
1000000000000
1000000000000000
1000000000000000000

We entered ten elements but this shows only six. What has happened? Let's take a look at what's in the hash.

use Data::Dump;
dd \%hash;

and this outputs

{
  "1000"                => 1,
  "1000000"             => 1,
  "1000000000"          => 1,
  "1000000000000"       => 1,
  "1000000000000000"    => 1,
  "1000000000000000000" => 1,
  "1e+021"              => 1,
  "1e+024"              => 1,
  "1e+027"              => 1,
  "1e+030"              => 1,
}

so the hash doesn't use the keys that we imagined. It stringifies the numbers in a way that it would be foolish to try to emulate.

For a slightly more practical example, say we had some circles and wanted to collect into sets by area. The obvious thing is to use the area as a hash key, like this program which creates 100,000 circles with random integer diameters up to 18 million.

use strict;
use warnings;
use 5.010;

package Circle;

use Math::Trig 'pi';

sub new {
  my $class = shift;
  my $self = { radius => shift };
  bless $self, $class;
}

sub area {
  my $self = shift;
  my $radius = $self->{radius};
  pi * $radius * $radius;
}



package main;

my %circles;

for (1 .. 100_000) {
   my $circle = Circle->new(int rand 18_000_000);
   push @{ $circles{$circle->area} }, $circle;
}

Now let's see how many of those hash keys use scientific notation

say scalar grep /e/, keys %circles;

which says (randomly, of course)

861

so there really isn't a tidy way of know what string perl will use if we specify a number as a hash index.



回答4:

In Perl an @array is an ordered list of values ($v1, $v2, ...) accessed by an integer (both positive and negative), while a %hash is an unordered list of 'key => value' pairs (k1 => $v1, k2 => $v2, ...) accessed by a string.

There are modules on CPAN that implement ordered hashes, like: Hash::Ordered and Tie::IxHash

You might want to use an array when you have ordered 'items' presumably a great number as well, for which using a %hash and sorting the keys and/or the values would be inefficient.



标签: arrays perl hash