Why ever use an array instead of a hash?

2019-06-06 05:20发布

问题:

I have read that it is much faster to iterate through a hash than through an array. Retrieving values from a hash is also much faster.

Instead of using an array, why not just use a hash and give each key a value corresponding to an index? If the items ever need to be in order, they can be sorted.

回答1:

Retrieving from hash is faster in a sense that you can fetch value directly by key instead of iterating over whole hash (or array when you're searching for particular string). Having that said, $hash{key} isn't faster than $array[0] as no iteration is taking place.

Arrays can't be replaced by hashes, as they have different features,

                       arrays hashes
------------------------------------
ordered keys            x       -
push/pop                x       -
suitable for looping    x       -
named keys              -       x


回答2:

I don't know where you read that hashes are faster than arrays. According to some Perl reference works (Mastering Algorithms with Perl), arrays are faster than hashes (follow this link for some more info).

If speed is your only criterae, you should benchmark to see which technique is going to be faster. It depends on what operations you will be doing onto the array/hash.

Here is an SO link with some further information: Advantage of 'one dimensional' hash over array in Perl



回答3:

I think this is a good question: it's not so much a high level "language design" query so much as it is an implementation question. It could be worded in a way that emphasizes that - say using hashes versus arrays for a particular technique or use case.

Hashes are nice but you need lists/arrays (c.f. @RobEarl). You can use tie (or modules like Tie::IxHash or Tie::Hash::Indexed ) to "preserve" the order of a hash, but I believe these would have to be slower than a regular hash and in some cases you can't pass them around or copy them in quite the same way.



回答4:

This code is more or less how a hash works. It should explain well enough why you would want to use an array instead of a hash.

package DIYHash;
use Digest::MD5;

sub new {
     my ($class, $buckets) = @_;

     my $self = bless [], $class;
     $#$self = $buckets || 32;

     return $self;
}

sub fetch {
    my ( $self, $key ) = @_;

    my $i  = $self->_get_bucket_index( $key );
    my $bo = $self->_find_key_in_bucket($key);

    return $self->[$i][$bo][1];
}

sub store {
    my ( $self, $key, $value ) = @_;

    my $i  = $self->_get_bucket_index( $key );
    my $bo = $self->_find_key_in_bucket($key);
    $self->[$i][$bo] = [$key, $value];

    return $value;
}

sub _find_key_in_bucket {
    my ($self, $key, $index) = @_;

    my $bucket = $self->[$index];

    my $i = undef; 
    for ( 0..$#$bucket ) {
         next unless $bucket->[$_][0] eq $key;
         $i = $_;
    }
    $i = @$bucket unless defined $i;

    return $i;
}


# This function needs to always return the same index for a given key.
# It can do anything as long as it always does that.
# I use the md5 hashing algorithm here.
sub _get_bucket_index {
    my ( $self, $key ) = @_;

    # Get a number from 0 to 1 - bucket count.
    my $index = unpack( "I", md5($key) ) % @$self;

    return $index;
}

1;

To use this amazing cluster of code:

my $hash = DIYHash->new(4);  #This hash has 4 buckets.
$hash->store(mouse => "I like cheese");
$hash->store(cat => "I like mouse");

say $hash->fetch('mouse');

Hashes look like they are constant time, rather than order N because for a given data set, you select a number of buckets that keeps the number of items in any bucket very small.

A proper hashing system would be able to resize as appropriate when the number of collisions gets too high. You don't want to do this often, because it is an order N operation.



标签: arrays perl hash