Parse Text file and create complex tree structure

2019-09-02 07:17发布

问题:

I have an input text file which looks like this:

DEV=T124^BRD=100^IO=HDMI^MODE=1^REG=ABC^FLD=X^VAL=200
DEV=T124^BRD=100^IO=HDMI^MODE=1^REG=ABC^FLD=Y^VAL=100
DEV=T124^BRD=100^IO=HDMI^MODE=2^REG=ABC^FLD=X^VAL=100
DEV=T124^BRD=100^IO=HDMI^MODE=2^REG=ABC^FLD=Y^VAL=200
DEV=T124^BRD=100^IO=DP^MODE=1^REG=XYZ^FLD=X^VAL=200
DEV=T124^BRD=100^IO=DP^MODE=1^REG=XYZ^FLD=Y^VAL=100
DEV=T124^BRD=100^IO=DP^MODE=1^REG=MLK^FLD=X^VAL=200
DEV=T124^BRD=100^IO=DP^MODE=1^REG=MLK^FLD=Y^VAL=100

and I would like to parse it and output it to a file which looks like this:

DEV:T124
  BRD:100 
    IO:HDMI 
      MODE:1 
        REG:ABC 
          FLD:X,VAL:200                
          FLD:Y,VAL:100          
      MODE:2
        REG:ABC 
          FLD:X,VAL:100                
          FLD:Y,VAL:200          
    IO:DP 
      MODE:1 
        REG:XYZ 
          FLD:X,VAL:200                
          FLD:Y,VAL:100          
        REG:MLK 
          FLD:X,VAL:200                
          FLD:Y,VAL:100

I did look at this example but it doesn't solve my problem completely as Data:Dumper will print it into a tree structure. List of paths into hash array tree in Perl

Also I am a novice in Perl and don't understand the hash of hashes especially in this comment: https://stackoverflow.com/a/13209256/3430142

I used the code posted in that comment and wrote the following(@rows is an array that contains the lines in the input file).

I don't follow how the foreach loop works. So in case I need to change it in future, I don't know how to do it. That is the reason why I was asking for an alternate implementation which I can customize/understand rather than relying on that code.

I used few Dumper methods to edit certain things. I also used Tie to remove curly brackets and quotes.

open TREE, "+>", $ARGV[1] or die $!;
my $tree = {"" => {}};
foreach my $input (@rows) { 
    chomp $input;       
    my $t = $tree;
    $t = $t->{$_} //= {} for split /\^/ => $input;
}

$Data::Dumper::Indent = 1;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Useqq = 1;
$Data::Dumper::Varname = "PROD";
$Data::Dumper::Terse = 1;
$Data::Dumper::Purity  = 1;
$Data::Dumper::Sparseseen = 1;
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Pair = "";
$Data::Dumper::Quotekeys = 1;

print TREE Dumper $tree;
close TREE;

tie @PST, 'Tie::File', $ARGV[1] or die $!;
for (@PST) {
    s/[\{\},"]//g;
}
untie @PST;

And the output looks like this:

DEV:T124
  BRD:100 
    IO:HDMI 
      MODE:1 
        REG:ABC 
          FLD:X
            VAL:200   



          FLD:Y
            VAL:100     



      MODE:2
        REG:ABC 
          FLD:X
            VAL:100    



          FLD:Y
            VAL:200     




    IO:DP 
      MODE:1 
        REG:XYZ 
          FLD:X
            VAL:200                



          FLD:Y
            VAL:100          



        REG:MLK 
          FLD:X
            VAL:200            



          FLD:Y
            VAL:100

As you can see, I couldn't get rid of the new lines that were created by removing the curly braces, and I also can't get the structure I want as the Dumper already created a pre-define tree.

Thanks for your help.

回答1:

You don't need any external modules for this, or even any complicated data structures. All you need are arrays. One to hold your current line's data, and another to hold the previous line's data to compare to.

The following is a script that pretty much demonstrates what you desire, but you'll have to adapt it for your data, and input/output method:

use strict;
use warnings;

my @prev;

while (<DATA>) {
    chomp;
    my @data = split;
    for my $i (0..$#data) {
        if (! $prev[$i] || $data[$i] ne $prev[$i]) {
            @prev = ();
            print '' . ('  ' x $i) . $data[$i] . "\n";
        }
    }
    @prev = @data;
}

__DATA__
step1a step2a step3a step4a step5a step6a
step1a step2a step3a step4a step5a step6b
step1a step2a step3a step4b step5a step6a
step1a step2a step3a step4b step5a step6b
step1a step2a step3b step4a step5b step6a
step1a step2a step3b step4a step5b step6b
step1a step2a step3b step4a step5c step6a
step1a step2a step3b step4a step5c step6b

Outputs

step1a
  step2a
    step3a
      step4a
        step5a
          step6a
          step6b
      step4b
        step5a
          step6a
          step6b
    step3b
      step4a
        step5b
          step6a
          step6b
        step5c
          step6a
          step6b


回答2:

If you can live with sorted keys, the following right after you create $tree will do mostly what you want based on the way you create your hash:

dump_tree($tree);

sub dump_tree {
    my ($hashR, $indent) = @_;

    $indent ||= 0;          # In case use warnings
    foreach my $key (sort keys %$hashR) {
        (my $print_key = $key) =~ s/=/:/;
        print TREE ((' ' x $indent), "$print_key\n");
        dump_tree($hashR->{$key}, $indent+2);
    }
}

It doesn't double up the FLD:,VAL: on the same line, as your example did, though. That should be a relatively easy addition where you can check if you have only one more deeper level with a single key before you recurse into dump_tree.