PHP library for creating/manipulating fixed-width

2019-03-14 16:09发布

问题:

We have a web application that does time-tracking, payroll, and HR. As a result, we have to write a lot of fixed-width data files for export into other systems (state tax filings, ACH files, etc). Does anyone know of a good library for this where you can define the record types/structures, and then act on them in an OOP paradigm?

The idea would be a class that you hand specifications, and then work with an instance of said specification. IE:

$icesa_file = new FixedWidthFile();
$icesa_file->setSpecification('icesa.xml');
$icesa_file->addEmployer( $some_data_structure );

Where icesa.xml is a file that contains the spec, although you could just use OOP calls to define it yourself:

$specification = new FixedWidthFileSpecification('ICESA');
$specification->addRecordType(
    $record_type_name = 'Employer',
    $record_fields = array(
         array('Field Name', Width, Vailditation Type, options)
         )
     );

EDIT: I'm not looking for advice on how to write such a library--I just wanted to know if one already existed. Thank you!!

回答1:

I don't know of a library that does exactly what you want, but it should be rather straight-forward to roll your own classes that handle this. Assuming that you are mainly interested in writing data in these formats, I would use the following approach:

(1) Write a lightweight formatter class for fixed width strings. It must support user defined record types and should be flexible with regard to allowed formats

(2) Instantiate this class for every file format you use and add required record types

(3) Use this formatter to format your data

As you suggested, you could define the record types in XML and load this XML file in step (2). I don't know how experienced you are with XML, but in my experience XML formats often causes a lot of headaches (probably due to my own incompetence regarding XML). If you are going to use these classes only in your PHP program, there's not much to gain from defining your format in XML. Using XML is a good option if you will need to use the file format definitions in many other applications as well.

To illustrate my ideas, here is how I think you would use this suggested formatter class:

<?php
include 'FixedWidthFormatter.php' // contains the FixedWidthFormatter class
include 'icesa-format-declaration.php' // contains $icesaFormatter
$file = fopen("icesafile.txt", "w");

fputs ($file, $icesaFormatter->formatRecord( 'A-RECORD', array( 
    'year' => 2011, 
    'tein' => '12-3456789-P',
    'tname'=> 'Willie Nelson'
)));
// output: A2011123456789UTAX     Willie Nelson                                     

// etc...

fclose ($file);
?>

The file icesa-format-declaration.php could contain the declaration of the format somehow like this:

<?php
$icesaFormatter = new FixedWidthFormatter();
$icesaFormatter->addRecordType( 'A-RECORD', array(
    // the first field is the record identifier
    // for A records, this is simply the character A
    'record-identifier' => array(
        'value' => 'A',  // constant string
        'length' => 1 // not strictly necessary
                      // used for error checking
    ),
    // the year is a 4 digit field
    // it can simply be formatted printf style
    // sourceField defines which key from the input array is used
    'year' =>  array(
        'format' => '% -4d',  // 4 characters, left justified, space padded
        'length' => 4,
        'sourceField' => 'year'
    ),
    // the EIN is a more complicated field
    // we must strip hyphens and suffixes, so we define
    // a closure that performs this formatting
    'transmitter-ein' => array(
        'formatter'=> function($EIN){
            $cleanedEIN =  preg_replace('/\D+/','',$EIN); // remove anything that's not a digit
            return sprintf('% -9d', $cleanedEIN); // left justified and padded with blanks
        },
        'length' => 9,
        'sourceField' => 'tein'
    ),
    'tax-entity-code' => array(
        'value' => 'UTAX',  // constant string
        'length' => 4
    ),
    'blanks' => array(
        'value' => '     ',  // constant string
        'length' => 5
    ),
    'transmitter-name' =>  array(
        'format' => '% -50s',  // 50 characters, left justified, space padded
        'length' => 50,
        'sourceField' => 'tname'
    ),
    // etc. etc.
));
?>

Then you only need the FixedWidthFormatter class itself, which could look like this:

<?php

class FixedWidthFormatter {

    var $recordTypes = array();

    function addRecordType( $recordTypeName, $recordTypeDeclaration ){
        // perform some checking to make sure that $recordTypeDeclaration is valid
        $this->recordTypes[$recordTypeName] = $recordTypeDeclaration;
    }

    function formatRecord( $type, $data ) {
        if (!array_key_exists($type, $this->recordTypes)) {
            trigger_error("Undefinded record type: '$type'");
            return "";
        }
        $output = '';
        $typeDeclaration = $this->recordTypes[$type];
        foreach($typeDeclaration as $fieldName => $fieldDeclaration) {
            // there are three possible field variants:
            //  - constant fields
            //  - fields formatted with printf
            //  - fields formatted with a custom function/closure
            if (array_key_exists('value',$fieldDeclaration)) {
                $value = $fieldDeclaration['value'];
            } else if (array_key_exists('format',$fieldDeclaration)) {
                $value = sprintf($fieldDeclaration['format'], $data[$fieldDeclaration['sourceField']]);
            } else if (array_key_exists('formatter',$fieldDeclaration)) {
                $value = $fieldDeclaration['formatter']($data[$fieldDeclaration['sourceField']]);
            } else {
                trigger_error("Invalid field declaration for field '$fieldName' record type '$type'");
                return '';
            }

            // check if the formatted value has the right length
            if (strlen($value)!=$fieldDeclaration['length']) {
                trigger_error("The formatted value '$value' for field '$fieldName' record type '$type' is not of correct length ({$fieldDeclaration['length']}).");
                return '';
            }
            $output .= $value;
        }
        return $output . "\n";
    }
}


?>

If you need read support as well, the Formatter class could be extended to allow reading as well, but this might be beyond the scope of this answer.



回答2:

I have happily used this class for similar use before. It is a php-classes file, but it is very well rated and has been tried-and-tested by many. It is not new (2003) but regardless it still does a very fine job + has a very decent and clean API that looks somewhat like the example you posted with many other goodies added.

If you can disregard the german usage in the examples, and the age factor -> it is very decent piece of code.

Posted from the example:


//CSV-Datei mit Festlängen-Werten 
echo "<p>Import aus der Datei fixed.csv</p>"; 
$csv_import2 = new CSVFixImport; 
$csv_import2->setFile("fixed.csv"); 
$csv_import2->addCSVField("Satzart", 2); 
$csv_import2->addCSVField("Typ", 1); 
$csv_import2->addCSVField("Gewichtsklasse", 1); 
$csv_import2->addCSVField("Marke", 4); 
$csv_import2->addCSVField("interne Nummer", 4); 


$csv_import2->addFilter("Satzart", "==", "020"); 
$csv_import2->parseCSV(); 
if($csv_import->isOK()) 
{ 
    echo "Anzahl der Datensätze: <b>" . $csv_import2->CSVNumRows() . "</b><br>"; 
    echo "Anzahl der Felder: <b>" . $csv_import2->CSVNumFields() . "</b><br>"; 
    echo "Name des 1.Feldes: <b>" . $csv_import2->CSVFieldName(0) . "</b><br>"; 

    $csv_import2->dumpResult(); 
}

My 2 cents, good-luck!



回答3:

I don't know of any PHP library that specifically handles fixed-width records. But there are some good libraries for filtering and validating a row of data fields if you can do the job of breaking up each line of the file yourself.

Take a look at the Zend_Filter and Zend_Validate components from Zend Framework. I think both components are fairly self-contained and require only Zend_Loader to work. If you want you can pull just those three components out of Zend Framework and delete the rest of it.

Zend_Filter_Input acts like a collection of filters and validators. You define a set of filters and validators for each field of a data record which you can use to process each record of a data set. There are lots of useful filters and validators already defined and the interface to write your own is pretty straightforward. I suggest the StringTrim filter for removing padding characters.

To break up each line into fields I would extend the Zend_Filter_Input class and add a method called setDataFromFixedWidth(), like so:

class My_Filter_Input extends Zend_Filter_Input
{
    public function setDataFromFixedWidth($record, array $recordRules)
    {
        if (array_key_exists('regex', $recordRules) {
            $recordRules = array($recordRules);
        }

        foreach ($recordRules as $rule) {
            $matches = array();
            if (preg_match($rule['regex'], $record, $matches)) {
                $data = array_combine($rule['fields'], $matches);
                return $this->setData($data);
            }
        }

        return $this->setData(array());
    }

}

And define the various record types with simple regular expressions and matching field names. ICESA might look something like this:

$recordRules = array(
    array(
        'regex'  => '/^(A)(.{4})(.{9})(.{4})/',  // This is only the first four fields, obviously
        'fields' => array('recordId', 'year', 'federalEin', 'taxingEntity',),
    ),
    array(
        'regex'  => '/^(B)(.{4})(.{9})(.{8})/',
        'fields' => array('recordId', 'year', 'federalEin', 'computer',),
    ),
    array(
        'regex'  => '/^(E)(.{4})(.{9})(.{9})/',
        'fields' => array('recordId', 'paymentYear', 'federalEin', 'blank1',),
    ),
    array(
        'regex'  => '/^(S)(.{9})(.{20})(.{12})/',
        'fields' => array('recordId', 'ssn', 'lastName', 'firstName',),
    ),
    array(
        'regex'  => '/^(T)(.{7})(.{4})(.{14})/',
        'fields' => array('recordId', 'totalEmployees', 'taxingEntity', 'stateQtrTotal'),
    ),
    array(
        'regex'  => '/^(F)(.{10})(.{10})(.{4})/',
        'fields' => array('recordId', 'totalEmployees', 'totalEmployers', 'taxingEntity',),
    ),
);

Then you can read your data file line by line and feed it into the input filter:

$input = My_Filter_Input($inputFilterRules, $inputValidatorRules);
foreach (file($filename) as $line) {
    $input->setDataFromFixedWidth($line, $recordRules);
    if ($input->isValid()) {
        // do something useful
    }
    else {
        // scream and shout
    }
}

To format data for writing back to the file, you would probably want to write your own StringPad filter that wraps the internal str_pad function. Then for each record in your data set:

$output = My_Filter_Input($outputFilterRules);
foreach ($dataset as $record) {
    $output->setData($record);
    $line = implode('', $output->getEscaped()) . "\n";
    fwrite($outputFile, $line);
}

Hope this helps!



回答4:

I think you need a bit more information than you supplied: What kind of data structures would you like to use for your records and column definitions? It seems like this is a rather specialized class that would require customization for your specific use case.

I have a PHP class that I wrote that basically does what you are looking for, but relying on other classes that we use in our system. If you can supply the types of data structures you want to use it with I can check if it will work for you and send it over.

Note: I published this answer before from a public computer and I could not get it to appear to be from me (it showed as some random user). If you see it, please ignore the answer from 'john'.



回答5:

If this is text file with separated fields, - your will need write it yourself. Probably it is not a large problem. Good organization, will save a lot of time.

  1. Your need universal way of defining structures. I.e. xml.
  2. Your need something to generate ... specially I prefer an Smarty templating for this.

So this one:

   <group>

      <entry>123</entry>

      <entry>123</entry>

      <entry>123</entry>

    </group>

Can be easy interpreted into test with this template:

{section name=x1 loop=level1_arr}

{--output root's--}

  {section name=x2 loop=level1_arr[x1].level2_arr}

     {--output entry's--}

  {/section}

{/section}

This is just idea.

But imagine:

  1. You need xml
  2. You need template

i.e. 2 definitions to abstract any text structure



回答6:

Perhaps the dbase functions are what you want to use. They are not OOP, but it probably would not be too difficult to build a class that would act on the functions provided in the dbase set.

Take a look at the link below for details on dbase functionality available in PHP. If you're just looking to create a file for import into another system, these functions should work for you. Just make sure you pay attention to the warnings. Some of the key warnings are:

  • There is no support for indexes or memo fields.
  • There is no support for locking.
  • Two concurrent web server processes modifying the same dBase file will very likely ruin your database.

http://php.net/manual/en/book.dbase.php



回答7:

I'm sorry i cant help you with a direct class i have seen some thing that does this but i can't remember where so sorry for that but it should be simple for a coder to build,

So how i have seen this work in an example:

php reads in data

php then uses a flag (E.G a $_GET['type']) to know how to output the data E.G Printer, HTML, Excel

So you build template files for each version then depending on the flag you load and use the defined template, as for Fixed Width this is a HTML thing not PHP so this should be done in templates CSS

Then from this you can output your data how ever any user requires it,

Smarty Templates is quite good for this and then the php header to send the content type when required.