best way to parse plain text file with a nested in

2019-07-18 08:23发布

问题:

The text file has hundreds of these entries (format is MT940 bank statement)

{1:F01AHHBCH110XXX0000000000}{2:I940X           N2}{3:{108:XBS/091502}}{4:
:20:XBS/091202/0001
:25:5887/507004-50
:28C:140/1
:60F:C0914CHF7789,
:61:0912021202D36,80NTRFNONREF//0887-1202-29-941
04392579-0 LUTHY + xxx, ZUR
:86:6034?60LUTHY + xxxx, ZUR vom 01.12.09 um 16:28 Karten-Nr. 2232
2579-0
:62F:C091202CHF52,2
:64:C091302CHF52,2
-}

This should go into an Array of Hashes like

[{"1"=>"F01AHHBCH110XXX0000000000"},
  "2"=>"I940X           N2", 
   3 => {108=>"XBS/091502"}
etc.
} ]

I tried it with tree top, but it seemed not to be the right way, because it's more for something you want to do calculations on, and I just want the information.

grammar Mt940

  rule document
    part1:string spaces [:|/] spaces part2:document 
    {
      def eval(env={})
        return part1.eval, part2.eval
      end
    }
    / string
    /  '{' spaces document spaces '}' spaces
    {
      def eval(env={})
        return [document.eval]
      end
    }
  end
end

I also tried with a regular expression

matches = str.scan(/\A[{]?([0-9]+)[:]?([^}]*)[}]?\Z/i)

but it's difficult with recursion ...

How can I solve this problem?

回答1:

There are several open source MT940 parsers available in Java and PHP. You can look at the source code and port it to Ruby. If you are on JRuby then you can use the java parser in your ruby code.

Other option is to use the OFX gem. The gem parses OFX files. Since your file is in MT940 format, you have to convert the file to OFX format using one of the free converters available. This approach is practical if you are importing in a batch job etc.

Reference

MT940 Java parser.

MT940 to OFX Converter 1

MT940 to OFX Converter 2