Parsing in Emacs Lisp

I'm writing a parser in Emacs Lisp. It's a parser for text files looking like this:

rule:
  int: 1, 2, 3, ...
  string: and, or, then, when
  text:
  ----------
  Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Pellentesque
  in tellus. In pharetra consequat augue. In congue. Curabitur
  pellentesque iaculis eros. Proin magna odio, posuere sed, commodo nec,
  varius nec, tortor.
  ----------
  more: ...

rule:
  ...

I don't really care about the key (int, string, ...). I want the value. So for the file above int has value "1, 2, 3, ...", string "and, or, then, when" and text "Lorem ..." (excluding the dashes).

I'm thinking about two different solutions, but I don't which one to use. Should I:

create a simple parser that loops through all lines and for each line matches it with some regex and then group the parts I want out?
do a more sophisticated parser with a lexer and a parser?

Right now the files are quite simple and I guess I don't need to do something as advance as the second option. But these files may get a bit more complicated, so I want to make it easy to extend.

How would you solve this?

标签： parsing emacs elisp

3条回答

对你真心纯属浪费

2楼-- · 2019-01-31 15:02

for parser stuff look to the Semantic library from CEDET project

0人赞添加讨论(0) 举报

地球回转人心会变

3楼-- · 2019-01-31 15:08

Are you already familiar with recursive descent parsers? They're relatively easy to write by hand in your favourite programming language, which would include Emacs Lisp. For very simple parsing, you can often get by with looking-at and search-forward. These would also form the basis of any tokenizing routines that would be called by your recursive descent parser, or any other style of parser.

[11 Feb 2009] I added an example recursive descent parser in emacs lisp below. It parses simple arithmetic expressions including addition, subtraction, multiplication, division, exponentiation, and parenthesized sub-expressions. Right now, it assumes all tokens are in the global variable *tokens*, but if you modify gettok and peektok as necessary you can have them walk through a buffer. To use it as is, just try out the following:

(setq *token* '( 3 ^ 5 ^ 7 + 5 * 3 + 7 / 11))
(rdh/expr)
=> (+ (+ (^ 3 (^ 5 7)) (* 5 3)) (/ 7 11))

The parsing code follows.

(defun gettok ()
  (and *token* (pop *token*)))
(defun peektok ()
  (and *token* (car *token*)))

(defun rdh/expr ()
  (rdh/expr-tail (rdh/factor)))

(defun rdh/expr-tail (expr)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (equal tok ")"))
       expr)
      ((member tok '(+ -))
       (gettok)
       (let ((fac (rdh/factor)))
         (rdh/expr-tail (list tok expr fac))))
      (t (error "bad expr")))))

(defun rdh/factor ()
  (rdh/factor-tail (rdh/term)))

(defun rdh/factor-tail (fac)
  (let ((tok (peektok)))
    (cond ((or (null tok)
           (member tok '(")" + -)))
       fac)
      ((member tok '(* /))
       (gettok)
       (let ((term (rdh/term)))
         (rdh/factor-tail (list tok fac term))))
      (t (error "bad factor")))))

(defun rdh/term ()
  (let* ((prim (rdh/prim))
         (tok (peektok)))
    (cond ((or (null tok)
               (member tok '(")" + - / *)))
           prim)
          ((equal tok '^)
           (gettok)
           (list tok prim (rdh/term)))
          (t (error "bad term")))))

(defun rdh/prim ()
  (let ((tok (gettok)))
    (cond ((numberp tok) tok)
      ((equal tok "(")
       (let* ((expr (rdh/expr))
          (tok (peektok)))
         (if (not (equal tok ")"))
         (error "bad parenthesized expr")
           (gettok)
           expr)))
      (t (error "bad prim")))))

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-01-31 15:10

There is a relatively simple parser you can find on the Emacs Wiki: ParserCompiler

The Parser Compiler for Emacs creates Recursive Descent parsers in pure elisp.

The goal of the project is to create a useful Parser Compiler that is both innovative and practically useful. This is an original work created by Mike Mattie - codermattie@gmail.com

Parsers are compiled by a Macro that translates a parser definition DSL to pure elisp. The syntax supports the PEG grammar class currently.

0人赞添加讨论(0) 举报

Parsing in Emacs Lisp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间