I have a text file dump that I need to convert to a delimited file. The file contains a series of "records" (for lack of a better word) formatted like this:
User: abc123
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text
User: abc123
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
which may include <newline> and
extend to multiple lines of text
Resolution: foo un-barred in multiple lines of text
...
Right now, with Java, I'm using StringBuffer to read this file line-by-line, parsing the lines to individual fields based on a series of if(inputLine.toLowerCase().startsWith("user:"))
logic to output a final delimited line to a text file.
However, the fields Problem
and Resolution
are free form and may be multi-line. I'm trying to do something that would create two Strings: append all lines following Problem:
and ending at Resolution:
and append all lines starting after Resolution:
and ending at Form:
.
I've alerady viewed this link and this link, which suggest that StringBuilder
might be an appropriate way to do this...however, I'm not quite sure how to construct the logic.
EDIT: Since I'm reading line-by-line, I'm having a hard time wrapping my head around how to code
<pseudocode>
If the line starts with "Problem" extract the charactes after "Problem" else
if the PRIOR line starts with "problem" and the current line doesnt start with "resolution" then append characters in line to prior line
etc.
</pseudocode>
but then, if there's a third line of "Problem...? I just can't visualize how to make it work.
Any ideas or alternate methods of achieving my desired results?
Hi if I understand your problem correctly then something along these lines should work:
I'm going to be a little bit bold here and suggest the use of a real parser generator, such as JavaCC.
You mention in your question that there are only two fields that are freeform, but perhaps there might be others that are added in the future as freeform? Hardcoding two fields to be handled differently can have lots of side effects when a third, fourth or nth special case is added.
JavaCC will generate a real parser for you without requiring any additional jars at runtime, and even better, will allow you to think about your parsing rules so that special cases in the future will not cause you any grief.