Writing a formal language parser with Lisp

2019-02-05 05:06发布

问题:

My company is designing a new domain specific scripting language; I have to implement a parser that translates our brand new programming language into a common scripting language so as to be able to enact it.

The usual way I do this is by means of Bison and Flex tools that generate the C/C++ code of the translator.

I found other tools, for most of the mainstream programming languages, but none for Lisp.

Hasn't Lisp ever been used for that? What is the usual way to write a parser with Lisp?

Note: to me, any Lisp implementation / dialect that could help is ok, I do not have any preference.

回答1:

To cover the Racket part of it:

People often write parsers and there are many ways to do so:

  • Write a recursive descent parser manually.
  • Use the parser-tools library in Racket, which is lex/yacc style.
  • Use Ragg, an AST generator generator letting you write BNF.
  • Use Parsack, a monadic parser combinator library similar to Haskell's Parsec.
  • I'm probably overlooking at least a half-dozen other options (e.g. I know there's at least one PEG style lib for Racket).


回答2:

Well, "the usual" way to do this in Common Lisp is … to do it in Lisp.

A lot of domain-specific languages (and Lisp is pretty much notoriously specialized for this purpose!) are simply written as extensions to Lisp itself, using the macro facility. The upside is, it's trivial to write a DSL. The downside is, they often tend to "look like" lisp.

Some examples of DSL's within the Common Lisp standard include the LOOP macro's own sub-language and the sub-language of the FORMAT specifiers.

Since Lisp's s-expression notation is nominally a written form of an Abstract Syntax Tree, it's one way to avoid having much of your own lexer or parser; you can just use READ.

That all being said, you can use some common packages that might be found in GRAYLEX or CL-LEXER or so forth; looking at the parsers for some other language with a similar syntax to yours might help. In Quicklisp, I see:

CL-USER> (ql:system-apropos "parse")
#<SYSTEM cl-arff-parser / cl-arff-parser-20130421-git / quicklisp 2013-08-13>                                                                                                                                                                                                   
#<SYSTEM cl-date-time-parser / cl-date-time-parser-20130813-git / quicklisp 2013-08-13>                                                                                                                                                                                         
#<SYSTEM cl-html-parse / cl-html-parse-20130813-git / quicklisp 2013-08-13>                                                                                                                                                                                                     
#<SYSTEM cl-html5-parser / cl-html5-parser-20130615-git / quicklisp 2013-08-13>                                                                                                                                                                                                 
#<SYSTEM cl-html5-parser-tests / cl-html5-parser-20130615-git / quicklisp 2013-08-13>                                                                                                                                                                                           
#<SYSTEM cl-pdf-parser / cl-pdf-20130420-git / quicklisp 2013-08-13>                                                                                                                                                                                                            
#<SYSTEM cli-parser / cl-cli-parser-20120305-cvs / quicklisp 2013-08-13>                                                                                                                                                                                                        
#<SYSTEM clpython.parser / clpython-20130615-git / quicklisp 2013-08-13>                                                                                                                                                                                                        
#<SYSTEM com.gigamonkeys.parser / monkeylib-parser-20120208-git / quicklisp 2013-08-13>                                                                                                                                                                                         
#<SYSTEM com.informatimago.common-lisp.html-parser / com.informatimago-20130813-git / quicklisp 2013-08-13>                                                                                                                                                                     
#<SYSTEM com.informatimago.common-lisp.parser / com.informatimago-20130813-git / quicklisp 2013-08-13>                                                                                                                                                                          
#<SYSTEM csv-parser / csv-parser-20111001-git / quicklisp 2013-08-13>                                                                                                                                                                                                           
#<SYSTEM fucc-parser / fucc_0.2.1 / quicklisp 2013-08-13>                                                                                                                                                                                                                       
#<SYSTEM http-parse / http-parse-20130615-git / quicklisp 2013-08-13>                                                                                                                                                                                                           
#<SYSTEM http-parse-test / http-parse-20130615-git / quicklisp 2013-08-13>                                                                                                                                                                                                      
#<SYSTEM js-parser / js-parser-20120909-git / quicklisp 2013-08-13>                                                                                                                                                                                                             
#<SYSTEM parse-declarations-1.0 / parse-declarations-20101006-darcs / quicklisp 2013-08-13>                                                                                                                                                                                     
#<SYSTEM parse-float / parse-float-20121125-git / quicklisp 2013-08-13>                                                                                                                                                                                                         
#<SYSTEM parse-float-tests / parse-float-20121125-git / quicklisp 2013-08-13>                                                                                                                                                                                                   
#<SYSTEM parse-js / parse-js-20120305-git / quicklisp 2013-08-13>                                                                                                                                                                                                               
#<SYSTEM parse-number / parse-number-1.3 / quicklisp 2013-08-13>                                                                                                                                                                                                                
#<SYSTEM parse-number-range / parse-number-range-1.0 / quicklisp 2013-08-13>                                                                                                                                                                                                    
#<SYSTEM parse-number-tests / parse-number-1.3 / quicklisp 2013-08-13>                                                                                                                                                                                                          
#<SYSTEM parse-rgb / cl-tcod-20130615-hg / quicklisp 2013-08-13>                                                                                                                                                                                                                
#<SYSTEM parseltongue / parseltongue-20130312-git / quicklisp 2013-08-13>                                                                                                                                                                                                       
#<SYSTEM parser-combinators / cl-parser-combinators-20121125-git / quicklisp 2013-08-13>                                                                                                                                                                                        
#<SYSTEM parser-combinators-cl-ppcre / cl-parser-combinators-20121125-git / quicklisp 2013-08-13>                                                                                                                                                                               
#<SYSTEM parser-combinators-tests / cl-parser-combinators-20121125-git / quicklisp 2013-08-13>                                                                                                                                                                                  
#<SYSTEM py-configparser / py-configparser-20101006-svn / quicklisp 2013-08-13>                                   


回答3:

There are two ways to parse non-lispy languages in common-lisp.

1) Use readtables. this is the classic way: the lisp reader algorithm is a simple recursive-decent parser already, which supports character-based dispatch. Vacietis does this here

2) Use a parsing library. I can recommend esrap as a good utility for doing packrat parsing, and smug as a decent one for doing monadic parsing. Both are available in quicklisp