Parse fixed-width files

I have a lot of text files with fixed-width fields:

<c>     <c>       <c>
Dave    Thomas    123 Main
Dan     Anderson  456 Center
Wilma   Rainbow   789 Street

The rest of the files are in a similar format, where the <c> will mark the beginning of a column, but they have various (unknown) column & space widths. What's the best way to parse these files?

I tried using Text::CSV, but since there's no delimiter it's hard to get a consistent result (unless I'm using the module wrong):

my $csv = Text::CSV->new();
$csv->sep_char (' ');

while (<FILE>){
    if ($csv->parse($_)) {
        my @columns=$csv->fields();
        print $columns[1] . "\n";
    }
}

标签： perl parsing

3条回答

Rolldiameter

2楼-- · 2019-02-16 16:22

Just use Perl's unpack function. Something like this:

while (<FILE>) {
    my ($first,$last,$street) = unpack("A9A25A50",$_);

    <Do something ....>
}

Inside the unpack template, the "A###", you can put the width of the field for each A. There are a variety of other formats that you can use to mix and match with, that is, integer fields, etc... If the file is fixed width, like mainframe files, then this should be the easiest.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-02-16 16:30

As user604939 mentions, unpack is the tool to use for fixed width fields. However, unpack needs to be passed a template to work with. Since you say your fields can change width, the solution is to build this template from the first line of your file:

my @template = map {'A'.length}        # convert each to 'A##'
               <DATA> =~ /(\S+\s*)/g;  # split first line into segments
$template[-1] = 'A*';                  # set the last segment to be slurpy

my $template = "@template";
print "template: $template\n";

my @data;
while (<DATA>) {
    push @data, [unpack $template, $_]
}

use Data::Dumper;

print Dumper \@data;

__DATA__
<c>     <c>       <c>
Dave    Thomas    123 Main
Dan     Anderson  456 Center
Wilma   Rainbow   789 Street

which prints:

template: A8 A10 A*
$VAR1 = [
          [
            'Dave',
            'Thomas',
            '123 Main'
          ],
          [
            'Dan',
            'Anderson',
            '456 Center'
          ],
          [
            'Wilma',
            'Rainbow',
            '789 Street'
          ]
        ];

0人赞添加讨论(0) 举报

时光不老，我们不散

4楼-- · 2019-02-16 16:47

CPAN to the rescue!

DataExtract::FixedWidth not only parses fixed-width files, but (based on POD) appears to be smart enough to figure out column widths from header line by itself!

0人赞添加讨论(0) 举报

Parse fixed-width files

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间