Handling NEWLINE in DelimitedRecord with Filehelpe

2019-05-04 17:22发布

问题:

Am using the excellent FileHelpers library to parse a number of different files. One of these files has (some) lines that look like this

id|name|comments|date
01|edov|bla bla bla bla|2012-01-01
02|john|bla bla bla bla|2012-01-02
03|Pete|bla bla <NEWLINE>
bla bla|2012-03-01
04|Mary|bla bla bla bla|2012-01-01

Note that line with id 3 has a newline in the text. Also note the comments are not surrounded by quotes so [FieldQuoted('"', MultilineMode.AllowForRead)] isn't going to save me.

Filehelpers throws an exception on line 4:

Delimiter '|' not found after field 'comments' (the record has less fields, the delimiter is wrong or the next field must be marked as optional).

Is there anyway I can parse this file with FileHelpers?

回答1:

You have to correct the input by adding quotes to the third field before passing it to the FileHelpers engine. It's easy to do with LINQ as demonstrated in the following program.

[DelimitedRecord("|")]
[IgnoreFirst(1)]
public class ImportRecord
{
    public string Id;
    public string Name;
    [FieldQuoted(QuoteMode.AlwaysQuoted, MultilineMode.AllowForRead)]
    public string Comments;
    public string Date;
}

class Program
{
    static void Main(string[] args)
    {
        var engine = new FileHelperEngine<ImportRecord>();

        string input = "id|name|comments|date" + Environment.NewLine +
                              "01|edov|bla bla bla bla|2012-01-01" + Environment.NewLine +
                              "02|john|bla bla bla bla|2012-01-02" + Environment.NewLine +
                              "03|Pete|bla bla" + Environment.NewLine +
                              "bla bla|2012-03-01" + Environment.NewLine +
                              "04|Mary|bla bla bla bla|2012-01-01";

        // Modify import to add quotes to multiline fields
        var inputSplitAtSeparator = input.Split('|');
        // Add quotes around the field after every third '|'
        var inputWithQuotesAroundCommentsField = inputSplitAtSeparator.Select((x, i) => (i % 3 == 2) ? "\"" + x + "\"" : x);
        var inputJoinedBackTogether = String.Join("|", inputWithQuotesAroundCommentsField);

        ImportRecord[] validRecords = engine.ReadString(inputJoinedBackTogether);

        // Check the third record
        Assert.AreEqual("03", validRecords[2].Id);
        Assert.AreEqual("Pete", validRecords[2].Name);
        Assert.AreEqual("bla bla" + Environment.NewLine + "bla bla", validRecords[2].Comments);
        Assert.AreEqual("2012-03-01", validRecords[2].Date);

        Console.WriteLine("All assertions passed");
        Console.ReadKey();
    }
}