Processing repeatedly structured text file with py

I have a big text file structured in blocks like:

Student = {
        PInfo = {
                ID   = 0001;
            Name.First = "Joe";
            Name.Last = "Burger";
            DOB  = "01/01/2000";
        };
        School = "West High";
        Address = {
            Str1 = "001 Main St.";
            Zip = 12345;
        };
    };
    Student = {
        PInfo = {
            ID   = 0002;
            Name.First = "John";
            Name.Last = "Smith";
            DOB  = "02/02/2002";
        };
        School = "East High";
        Address = {
            Str1 = "001 40nd St.";
            Zip = 12346;
        };
        Club = "Football";
    };
    ....

The Student blocks share the same entries like "PInfo", "School" and "Address", but some of them may have additional entries, such as the "Club" information for "John Smith" which is not included for "Joe Burger". What I want to do is to get Name, School name and zip code of each student and store them in a dictionary, like

    {'Joe Burger':{School:'West High', Zip:12345}, 'John Smith':{School:'East High', Zip:12346}, ...}

Being new to python programming, I tried to open the file and analyze it line by line, but it looks so cumbersome. And the real file is quite large and more complicated than the example I posted above. I am wondering if there is an easier way to do it. Thanks ahead.

标签： python parsing text parser-generator

3条回答

该账号已被封号

2楼-- · 2019-01-18 16:46

it's not json, but similar structured. you should be able to reformat it into json.

"=" -> ":"
quote all keys with '"'
";" -> ","
remove all "," which are followed by a "}"
put it in curly braces
parse it with json.loads

0人赞添加讨论(0) 举报

在下西门庆

3楼-- · 2019-01-18 16:47

To parse the file you could define a grammar that describes your input format and use it to generate a parser.

There are many language parsers in Python. For example, you could use Grako that takes grammars in a variation of EBNF as input, and outputs memoizing PEG parsers in Python.

To install Grako, run pip install grako.

Here's grammar for your format using Grako's flavor of EBNF syntax:

(* a file is zero or more records *)
file = { record }* $;
record = name '=' value ';' ;
name = /[A-Z][a-zA-Z0-9.]*/ ;
value = object | integer | string ;
(* an object contains one or more records *)
object = '{' { record }+ '}' ;
integer = /[0-9]+/ ;
string = '"' /[^"]*/ '"';

To generate parser, save the grammar to a file e.g., Structured.ebnf and run:

$ grako -o structured_parser.py Structured.ebnf

It creates structured_parser module that can be used to extract the student information from the input:

#!/usr/bin/env python
from structured_parser import StructuredParser

class Semantics(object):
    def record(self, ast):
        # record = name '=' value ';' ;
        # value = object | integer | string ;
        return ast[0], ast[2] # name, value
    def object(self, ast):
        # object = '{' { record }+ '}' ;
        return dict(ast[1])
    def integer(self, ast):
        # integer = /[0-9]+/ ;
        return int(ast)
    def string(self, ast):
        # string = '"' /[^"]*/ '"';
        return ast[1]

with open('input.txt') as file:
    text = file.read()
parser = StructuredParser()
ast = parser.parse(text, rule_name='file', semantics=Semantics())
students = [value for name, value in ast if name == 'Student']
d = {'{0[Name.First]} {0[Name.Last]}'.format(s['PInfo']):
     dict(School=s['School'], Zip=s['Address']['Zip'])
     for s in students}
from pprint import pprint
pprint(d)

Output

{'Joe Burger': {'School': u'West High', 'Zip': 12345},
 'John Smith': {'School': u'East High', 'Zip': 12346}}

0人赞添加讨论(0) 举报

Evening l夕情丶

4楼-- · 2019-01-18 16:53

For such thing, I use Marpa::R2, a Perl interface to Marpa, a general BNF parser. It allows decribing the text as a grammar rules and parse them to a tree of arrays (parse tree). You can then traverse the tree to save the results as a hash of hashes (hash is perl for python's dictionary) or use it as is.

I cooked a working example using your input: parser, result tree.

Hope this helps.

P.S. Example of ast_traverse(): Parse values from a block of text based on specific keys

0人赞添加讨论(0) 举报

Processing repeatedly structured text file with py

Output

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间