What's the best way of parsing a fixed-width f

2019-01-06 18:48发布

I've got a file from a vendor that has 115 fixed-width fields per line. What's the best way of parsing that file into the 115 fields so I can use them in my code?

My first thought is just to make constants for each field like NAME_START_POSITION and NAME_LENGTH and using substring. That just seems ugly so I'm curious if there's any other recommended ways of doing this. None of the couple of libraries a Google search turned up seemed any better either. Thanks

10条回答
一夜七次
2楼-- · 2019-01-06 19:24

You can use \t+ as your delimiter.

Try Something like

String fields[] = line.split("\t+");
查看更多
我命由我不由天
3楼-- · 2019-01-06 19:30
/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {

    public static void main(String as[]) throws InterruptedException {

        fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");

    }

    public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
        String[] values = input.split(dilimiter);
        String record = "";
        int recCounter = 0;
        for (Object O : values) {

            if (recCounter == reclength) {
                System.out.println(record.substring(0, record.length() - 1));// process
                                                                                // your
                                                                                // record
                record = "";
                record = record + O.toString() + ",";
                recCounter = 1;
            } else {

                record = record + O.toString() + ",";

                recCounter++;

            }

        }
        System.out.println(record.substring(0, record.length() - 1)); // process
                                                                        // your
                                                                        // record
    }

}
查看更多
唯我独甜
4楼-- · 2019-01-06 19:32

Here is a basic implementation I use:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;

public class FlatFileParser {

  public static void main(String[] args) {
    File inputFile = new File("data.in");
    File outputFile = new File("data.out");
    int columnLengths[] = {7, 4, 10, 1};
    String charset = "ISO-8859-1";
    String delimiter = "~";

    System.out.println(
        convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
        + " lines written to " + outputFile.getAbsolutePath());
  }

  /**
   * Converts a fixed width file to a delimited file.
   * <p>
   * This method ignores (consumes) newline and carriage return
   * characters. Lines returned is based strictly on the aggregated
   * lengths of the columns.
   *
   * A RuntimeException is thrown if run-off characters are detected
   * at eof.
   *
   * @param inputFile the fixed width file
   * @param outputFile the generated delimited file
   * @param columnLengths the array of column lengths
   * @param delimiter the delimiter used to split the columns
   * @param charsetName the charset name of the supplied files
   * @return the number of completed lines
   */
  public static final long convertFixedWidthFile(
      File inputFile,
      File outputFile,
      int columnLengths[],
      String delimiter,
      String charsetName) {

    InputStream inputStream = null;
    Reader inputStreamReader = null;
    OutputStream outputStream = null;
    Writer outputStreamWriter = null;
    String newline = System.getProperty("line.separator");
    String separator;
    int data;
    int currentIndex = 0;
    int currentLength = columnLengths[currentIndex];
    int currentPosition = 0;
    long lines = 0;

    try {
      inputStream = new FileInputStream(inputFile);
      inputStreamReader = new InputStreamReader(inputStream, charsetName);
      outputStream = new FileOutputStream(outputFile);
      outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);

      while((data = inputStreamReader.read()) != -1) {
        if(data != 13 && data != 10) {
          outputStreamWriter.write(data);
          if(++currentPosition > (currentLength - 1)) {
            currentIndex++;
            separator = delimiter;
            if(currentIndex > columnLengths.length - 1) {
              currentIndex = 0;
              separator = newline;
              lines++;
            }
            outputStreamWriter.write(separator);
            currentLength = columnLengths[currentIndex];
            currentPosition = 0;
          }
        }
      }
      if(currentIndex > 0 || currentPosition > 0) {
        String line = "Line " + ((int)lines + 1);
        String column = ", Column " + ((int)currentIndex + 1);
        String position = ", Position " + ((int)currentPosition);
        throw new RuntimeException("Incomplete record detected. " + line + column + position);
      }
      return lines;
    }
    catch (Throwable e) {
      throw new RuntimeException(e);
    }
    finally {
      try {
        inputStreamReader.close();
        outputStreamWriter.close();
      }
      catch (Throwable e) {
        throw new RuntimeException(e);
      }
    }
  }
}
查看更多
Root(大扎)
5楼-- · 2019-01-06 19:36

I've played arround with fixedformat4j and it is quite nice. Easy to configure converters and the like.

查看更多
再贱就再见
6楼-- · 2019-01-06 19:37

The Apache Commons CSV project can handle fixed with files.

Looks like the fixed width functionality didn't survive promotion from the sandbox.

查看更多
ら.Afraid
7楼-- · 2019-01-06 19:41

If your string is called inStr, convert it to a char array and use the String(char[], start, length) constructor

char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);
查看更多
登录 后发表回答