Parsing a file with single and multi-lines of data

2019-06-13 14:06发布

I have a text file dump that I need to convert to a delimited file. The file contains a series of "records" (for lack of a better word) formatted like this:

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
          which may include <newline> and 
          extend to multiple lines of text
Resolution: foo un-barred in multiple lines of text

...

Right now, with Java, I'm using StringBuffer to read this file line-by-line, parsing the lines to individual fields based on a series of if(inputLine.toLowerCase().startsWith("user:")) logic to output a final delimited line to a text file.

However, the fields Problem and Resolution are free form and may be multi-line. I'm trying to do something that would create two Strings: append all lines following Problem: and ending at Resolution: and append all lines starting after Resolution: and ending at Form:.

I've alerady viewed this link and this link, which suggest that StringBuilder might be an appropriate way to do this...however, I'm not quite sure how to construct the logic.

EDIT: Since I'm reading line-by-line, I'm having a hard time wrapping my head around how to code

<pseudocode>
If the line starts with "Problem" extract the charactes after "Problem" else
if the PRIOR line starts with "problem" and the current line doesnt start with "resolution" then append characters in line to prior line
etc.
</pseudocode>

but then, if there's a third line of "Problem...? I just can't visualize how to make it work.

Any ideas or alternate methods of achieving my desired results?

3条回答
走好不送
2楼-- · 2019-06-13 14:24

Hi if I understand your problem correctly then something along these lines should work:

    StringBuilder problemDesc = new String....;
    if(inputLine.toLowerCase().startsWith("problem:")){
       problemDesc.append(inputLine);
       while(!inputLine.toLowerCase().startsWith("resolution:"){
           //read next line into inputline;
           problemDesc.append(inputline);
       }
       //deal with problem description here and inputLine now has the line with
       //Resolution in it Repeat same logic for retrieving the resolution value
    }
查看更多
疯言疯语
3楼-- · 2019-06-13 14:27
StringBuilder problem;
StringBuilder resolution;

//...

// If the current line starts with "Problem: "
if(inputLine.toLowerCase().startsWith("Problem: ")) {
   // Continue appending to the string builder until the delimiting line is reached
   while(!inputLine.toLowerCase().startsWith("Resolution") {
      problem.append(inputLine);
   }
}

// Something similar for resolution
查看更多
成全新的幸福
4楼-- · 2019-06-13 14:28

I'm going to be a little bit bold here and suggest the use of a real parser generator, such as JavaCC.

You mention in your question that there are only two fields that are freeform, but perhaps there might be others that are added in the future as freeform? Hardcoding two fields to be handled differently can have lots of side effects when a third, fourth or nth special case is added.

JavaCC will generate a real parser for you without requiring any additional jars at runtime, and even better, will allow you to think about your parsing rules so that special cases in the future will not cause you any grief.

查看更多
登录 后发表回答