I have a big text file structured in blocks like:
Student = {
PInfo = {
ID = 0001;
Name.First = "Joe";
Name.Last = "Burger";
DOB = "01/01/2000";
};
School = "West High";
Address = {
Str1 = "001 Main St.";
Zip = 12345;
};
};
Student = {
PInfo = {
ID = 0002;
Name.First = "John";
Name.Last = "Smith";
DOB = "02/02/2002";
};
School = "East High";
Address = {
Str1 = "001 40nd St.";
Zip = 12346;
};
Club = "Football";
};
....
The Student blocks share the same entries like "PInfo", "School" and "Address", but some of them may have additional entries, such as the "Club" information for "John Smith" which is not included for "Joe Burger". What I want to do is to get Name, School name and zip code of each student and store them in a dictionary, like
{'Joe Burger':{School:'West High', Zip:12345}, 'John Smith':{School:'East High', Zip:12346}, ...}
Being new to python programming, I tried to open the file and analyze it line by line, but it looks so cumbersome. And the real file is quite large and more complicated than the example I posted above. I am wondering if there is an easier way to do it. Thanks ahead.
it's not json, but similar structured. you should be able to reformat it into json.
To parse the file you could define a grammar that describes your input format and use it to generate a parser.
There are many language parsers in Python. For example, you could use Grako that takes grammars in a variation of EBNF as input, and outputs memoizing PEG parsers in Python.
To install Grako, run
pip install grako
.Here's grammar for your format using Grako's flavor of EBNF syntax:
To generate parser, save the grammar to a file e.g.,
Structured.ebnf
and run:It creates
structured_parser
module that can be used to extract the student information from the input:Output
For such thing, I use Marpa::R2, a Perl interface to Marpa, a general BNF parser. It allows decribing the text as a grammar rules and parse them to a tree of arrays (parse tree). You can then traverse the tree to save the results as a hash of hashes (hash is perl for python's dictionary) or use it as is.
I cooked a working example using your input: parser, result tree.
Hope this helps.
P.S. Example of
ast_traverse()
: Parse values from a block of text based on specific keys