Parsing a CSV file using array of hash of hashes i

2019-08-09 09:18发布

问题:

I have CSV data in this form:

Sl.No, Label, Type1, Type2...
1, "label1", Y, N, N...
2, "label2", N, Y, Y...
...

Where "Y" and "N" denote whether the corresponding label is to be printed to a file or not.

while ( <$fh> ) {    #Reading the CSV file

    $filter = $_;
    chomp $filter;
    $filter =~ tr/\r//d;

    if ( $. == 1 ) {
        @fieldNames = split ",", $filter;
    }
    else {
        @fields = split ",", $filter;
        $numCustomers = scalar(@fields) - 2;
        push @labels, $fields[2];

        for ( $i = 0; $i < $numCustomers; $i++ ) {

            for ( $j = 0; $j < scalar(@labels); $j++ ) {
                $customer[$i][$j] = $fields[ 2 + $i ];
            }

            $custFile = "customer" . $i . "_external.h";

            open( $fh1, ">", $custFile ) or die "Unable to create external header file for customer $i";
        }
    }
}

for ( $i = 0; $i < scalar(@labels); $i++ ) {

    for ( $j = 0; $j < $numCustomers; $j++ ) {

        $Hash{ $fieldNames[ 2 + $i ] }->{ $labels[$i] } = $customer[$j][$i];
        push @aoh, %Hash;    #Array of hashes
    }
}

my @headerLines = read_file($intFile);  # read the internal file, and copy only
                                        # those lines that are not marked with
                                        # "N" in the CSV file to the external file.

# iterate over elements of each hash and print the labels only if value is 'Y'

foreach my $headerLine (@headerLines) {

    chomp $headerLine;

    for $i ( 0 .. $#aoh ) {

        for my $cust1 ( sort keys %{ $aoh[$i] } ) {    #HERE

            for my $reqLabel1 ( keys %{ $aoh[$i]{$cust1} } ) {

                print "$cust1, $reqLabel1 : $aoh[$i]{$cust1}{$reqLabel1}\n";

                if ( $aoh[$i]{$cust1}{$reqLabel1} eq "Y" ) {

                    for ( $j = 0; $j < $numCustomers; $j++ ) {
                        $req[$j][$i] = $reqLabel1;
                    }
                }
                else {
                    for ( $j = 0; $j < $numCustomers; $j++ ) {
                        $nreq[$j][$i] = $reqLabel1;
                    }
                }
            }

        }

        if ( grep { $headerLine =~ /$_/ } @nreq ) {
            next;    #Don't print this line in the external file
        }
        else {
            print $fh1 $headerLine . "\n";    #print this line in the external file
        }
    }
}

This complains "Cannot use string Type1 as a hash REF", referring to the line marked as #HERE.

I've tried dumping data structures everywhere, but I'm not sure where this cropped up from.

Any insights would be appreciated.

I have received feedback that using Text::CSV would be a better solution. How would it reduce the need to use nested data structures?

回答1:

Ok, your problem gets a lot easier with Text::CSV. I would suggest looking at a rewrite, or reasking your question framing it as such.

But your problem is actually this:

push @aoh, %Hash;                #Array of hashes

That doesn't create an array of hashes at all. That extracts all the elements from %Hash (in no particular order, aside from keys and values being paired) and inserts them into @aoh.

You probably want:

push @aoh, \%Hash;

Or perhaps:

push @aoh, { %Hash }; 

I'm not entirely clear, because you're reusing %Hash, so you may get duplication. This is best dealt with by use strict; use warnings; and lexically scoping your hashes correctly.



回答2:

I'd just keep an array of open file handles (if there aren't too many Types) and print to them while reading the file line by line.

#!/usr/bin/perl
use warnings;
use strict;

chomp( my $header = <> );
my @names = split /, /, $header;

my @handles;
for my $type (@names[ 2 .. $#names ]) {
    open my $fh, '>', $type or die "$type: $!";
    push @handles, $fh;
}

while (<>) {
    chomp;
    my @fields = split /, /;
    for my $index (0 .. $#handles) {
        print { $handles[$index] } $fields[1], "\n" if 'Y' eq $fields[ $index + 2 ];
    }
}

I used the following input to test it:

Sl.No, Label, Type1, Type2, Type3, Type4
1, "label1", Y, N, Y, N
2, "label2", N, Y, Y, N

If your input contains the \r line ends, set binmode to :crlf.