How to convert a Perl hash-of-hashes to a more fle

2019-02-26 03:52发布

问题:

In a quick-and-dirty Perl script, I have a data structure like this:

$tax_revenue{YEAR}{STATE}{GOVLEV}{TAX} = integer

The hash keys assume values like this:

YEAR: 1900 .. 2000
STATE: AK, AL, ... WY
GOVLEV: state, local
TAX: type of tax (income, sales, etc.)

In addition, the hash keys are unique. For example, no value for the TAX parameter collides with a value for another other parameter.

I am starting a medium-sized project working with this data and I would like to implement the data structure in a more flexible way. I don't know all of the data-retrieval functionality I will need yet, but here are some examples:

# Specify the parameters in any order.
Tax_rev( qw(1902 WY state property) );
Tax_rev( qw(state property 1902 WY) );

# Use named parameters.
Tax_rev(year => 1902, state => 'WY', govlev => 'state', tax => 'property');

# Use wildcards to obtain a list of values.
# For example, state property tax revenue in 1902 for all states.
Tax_rev( qw(1902 * state property) );

My initial inclination was to keep storing the data as a hash-of-hashes and to build one or more utility functions (probably as part of a class) to retrieve the values. But then I wondered whether there is a better strategy -- some way of storing the underlying data other than a hash-of-hashes. Any advice about how to approach this problem would be appreciated.

回答1:

If you want a pure Perl implementation, you could build an array of hashes:

my @taxdata = (
    { year => 1902, state => 'WY', level => 'state', type => 'property', amount => 500 },
    # ...
);

my @matches = grep {
    $_->{year}  == 1902    &&
    $_->{level} eq 'state' &&
    $_->{type}  eq 'property'
} @taxdata;

That's flexible if you want to run arbitrary queries against it, but slow if you want to be able to get to a specific record.

A better solution might be a database with a single table where each row contains the fields you listed. Then you could write an SQL query to extract data according to arbitrary criteria. You can use the DBI module to handle the connection.



回答2:

Please consider putting the data in an SQLite database. Then, you have the flexibility of running whatever query you want (via DBI or just the command line interface to SQL) and getting data structures that are suitable for generating reports for taxes by state or states by taxes or taxes for a given year for all states whose names begin with the letter 'W' etc etc. I presume the data are already in some kind of character separated format (tab, comma, pipe etc) and therefore can be easily bulk imported into an SQLite DB, saving some work and code on that end.



回答3:

I would advise you to look into an object system such as Moose. The learning curve isn't too steep (or steep at all) and the benefits will be enormous. You'd start with something like:

package MyApp;

use Moose; # use strict automagically in effect

has 'year'   => ( is => 'ro', isa => 'Int', required => 1 );
has 'state'  => ( is => 'ro', isa => 'Str', required => 1 );
has 'govlev' => ( is => 'ro', isa => 'Str', required => 1 );
has 'tax'    => ( is => 'ro', isa => 'Str', required => 1 );

Then in your main program:

use MyApp;

my $obj = MyApp->new(
    year   => 2000,
    state  => 'AK',
    govlev => 'local',
    tax    => 'revenue'
);

# ...

With the flexibility of MooseX::Types you can go on to declare your own type classes, with enums, etc.

Once you go Moose, you never look back :)



回答4:

Check out Data::Diver: "Simple, ad-hoc access to elements of deeply nested structures". It seems to do exactly what you want from Tax_rev:

use Data::Diver qw( Dive );

...
$tax_revenue{ 1900 }{ NC }{ STATE }{ SALES } = 1000;
...

  Dive( \%Hash, qw( 1900 NC STATE SALES ) ) => 1000;
  Dive( \%Hash, qw( 1901 NC STATE SALES ) ) => undef;


回答5:

If you aren't going to use objects, I think that data structure will work just fine.

Here is an example of Tax_rev(). It isn't full featured, but you can give it the 4 arguments in any order. If you actually use it you might want to check the inputs.

my $result = Tax_rev( \%data, qw(state property 1902 WY) );

use strict;
use warnings;
use 5.010;