I'm writing a parser in Emacs Lisp. It's a parser for text files looking like this:
rule:
int: 1, 2, 3, ...
string: and, or, then, when
text:
----------
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque
in tellus. In pharetra consequat augue. In congue. Curabitur
pellentesque iaculis eros. Proin magna odio, posuere sed, commodo nec,
varius nec, tortor.
----------
more: ...
rule:
...
I don't really care about the key (int, string, ...). I want the value. So for the file above int has value "1, 2, 3, ...", string "and, or, then, when" and text "Lorem ..." (excluding the dashes).
I'm thinking about two different solutions, but I don't which one to use. Should I:
create a simple parser that loops through all lines and for each line matches it with some regex and then group the parts I want out?
do a more sophisticated parser with a lexer and a parser?
Right now the files are quite simple and I guess I don't need to do something as advance as the second option. But these files may get a bit more complicated, so I want to make it easy to extend.
How would you solve this?
for parser stuff look to the Semantic library from CEDET project
Are you already familiar with recursive descent parsers? They're relatively easy to write by hand in your favourite programming language, which would include Emacs Lisp. For very simple parsing, you can often get by with
looking-at
andsearch-forward
. These would also form the basis of any tokenizing routines that would be called by your recursive descent parser, or any other style of parser.[11 Feb 2009] I added an example recursive descent parser in emacs lisp below. It parses simple arithmetic expressions including addition, subtraction, multiplication, division, exponentiation, and parenthesized sub-expressions. Right now, it assumes all tokens are in the global variable
*tokens*
, but if you modifygettok
andpeektok
as necessary you can have them walk through a buffer. To use it as is, just try out the following:The parsing code follows.
There is a relatively simple parser you can find on the Emacs Wiki: ParserCompiler