I have a lot of text files with fixed-width fields:
<c> <c> <c>
Dave Thomas 123 Main
Dan Anderson 456 Center
Wilma Rainbow 789 Street
The rest of the files are in a similar format, where the <c>
will mark the beginning of a column, but they have various (unknown) column & space widths. What's the best way to parse these files?
I tried using Text::CSV
, but since there's no delimiter it's hard to get a consistent result (unless I'm using the module wrong):
my $csv = Text::CSV->new();
$csv->sep_char (' ');
while (<FILE>){
if ($csv->parse($_)) {
my @columns=$csv->fields();
print $columns[1] . "\n";
}
}
Just use Perl's unpack function. Something like this:
Inside the unpack template, the "A###", you can put the width of the field for each A. There are a variety of other formats that you can use to mix and match with, that is, integer fields, etc... If the file is fixed width, like mainframe files, then this should be the easiest.
As user604939 mentions,
unpack
is the tool to use for fixed width fields. However,unpack
needs to be passed a template to work with. Since you say your fields can change width, the solution is to build this template from the first line of your file:which prints:
CPAN to the rescue!
DataExtract::FixedWidth not only parses fixed-width files, but (based on POD) appears to be smart enough to figure out column widths from header line by itself!