我想学习BNF并试图组装一些Z80 ASM代码。 由于我是新来的这两个领域,我的问题是,我是连在正确的轨道? 我想写Z80 ASM作为EBNF的格式,这样我可以再找出哪里从那里从源创建的机器代码。 目前,我有以下几点:
Assignment = Identifier, ":" ;
Instruction = Opcode, [ Operand ], [ Operand ] ;
Operand = Identifier | Something* ;
Something* = "(" , Identifier, ")" ;
Identifier = Alpha, { Numeric | Alpha } ;
Opcode = Alpha, Alpha ;
Int = [ "-" ], Numeric, { Numeric } ;
Alpha = "A" | "B" | "C" | "D" | "E" | "F" |
"G" | "H" | "I" | "J" | "K" | "L" |
"M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" |
"Y" | "Z" ;
Numeric = "0" | "1" | "2" | "3"| "4" |
"5" | "6" | "7" | "8" | "9" ;
如果我错了任何方向性反馈将是极好的。
老派装配是典型的手工编码的汇编器和用于自组织解析技术来处理组件的源极线,以产生实际的汇编代码。 当汇编语法很简单(如总操作码REG,操作数),这已经足够好了。
现代机器具有杂乱的,讨厌的指令集与许多指令的变化和操作数,其可以与复杂的语法允许多个索引寄存器参加操作数式表示。 允许与固定的和可重定位常数精密装配时表达式与各种类型的加法运算的复杂这一点。 先进的装配允许条件编译,宏,结构化数据报关单等所有语法上添加了新的要求。 通过特殊方法处理这一切的语法非常硬,是解析器发电机被发明的原因。
使用BNF和解析器发电机是建立一个现代化的组装非常合理的方式,即使是作为Z80的传统处理器,。 我已经做了这样的汇编摩托罗拉8位的机器,如六千八百零九分之六千八百,和我准备了一个现代的x86这样做。 我觉得你的方向准确地记录了正确的道路。
**********编辑****************的任择议定书要求,例如词法和语法分析器定义。 我在这里提供两种。
这些是从一个6809 asssembler真正规范的摘录。 完整的定义在这里,2-3X样本的大小。
为了保持空间的时候,我已经编辑了许多暗角落的复杂性是这些定义的点。 有人可能会由apparant复杂不要惊惶。 问题是,这样的定义,您试图描述语言的形状,而不是在程序代码时。 你会付出更高的显著复杂,如果你编写这一切都在一个特设的方式,这将是远远少维护。
它也将有一定的帮助要知道,这些定义与具有词法/分析工具作为子系统高端项目分析系统中,被称为的DMS软件再造工具包 。 DMS会自动建立从AST的
在解析器specfication,这使得它容易得多,以打造专业化分析工具语法规则。 最后,解析器规范包含所谓的“prettyprinter”的声明,这使DMS以regenreate从AST的源文本。 (一本语法的真正目的是为了让我们能够建立代表汇编指令的AST,然后将它们吐出馈送到一个真正的汇编!)
值得注意的一两件事:语意和语法规则如何规定(!的metasyntxax)不同的词法/语法分析器生成系统之间略有不同。 基于DMS-规范的语法也不例外。 DMS有自己的相对复杂的语法规则,那真的是不实际在这里的可用空间来解释。 您将有其它系统使用类似的符号,对于EBNF的规则,理念生活和和语意正则表达式的变体。
鉴于OP的利益,他可以实现类似的词法分析器/解析器与任何词法分析器/解析器生成工具,例如,FLEX / YACC,JavaCC中,ANTLR,...
********** LEXER **************
-- M6809.lex: Lexical Description for M6809
-- Copyright (C) 1989,1999-2002 Ira D. Baxter
%%
#mainmode Label
#macro digit "[0-9]"
#macro hexadecimaldigit "<digit>|[a-fA-F]"
#macro comment_body_character "[\u0009 \u0020-\u007E]" -- does not include NEWLINE
#macro blank "[\u0000 \ \u0009]"
#macro hblanks "<blank>+"
#macro newline "\u000d \u000a? \u000c? | \u000a \u000c?" -- form feed allowed only after newline
#macro bare_semicolon_comment "\; <comment_body_character>* "
#macro bare_asterisk_comment "\* <comment_body_character>* "
...[snip]
#macro hexadecimal_digit "<digit> | [a-fA-F]"
#macro binary_digit "[01]"
#macro squoted_character "\' [\u0021-\u007E]"
#macro string_character "[\u0009 \u0020-\u007E]"
%%Label -- (First mode) processes left hand side of line: labels, opcodes, etc.
#skip "(<blank>*<newline>)+"
#skip "(<blank>*<newline>)*<blank>+"
<< (GotoOpcodeField ?) >>
#precomment "<comment_line><newline>"
#preskip "(<blank>*<newline>)+"
#preskip "(<blank>*<newline>)*<blank>+"
<< (GotoOpcodeField ?) >>
-- Note that an apparant register name is accepted as a label in this mode
#token LABEL [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
(GotoLabelList ?)
>>
%%OpcodeField
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
-- Opcode field tokens
#token 'ABA' "[aA][bB][aA]"
<< (GotoEOLComment ?) >>
#token 'ABX' "[aA][bB][xX]"
<< (GotoEOLComment ?) >>
#token 'ADC' "[aA][dD][cC]"
<< (GotoABregister ?) >>
#token 'ADCA' "[aA][dD][cC][aA]"
<< (GotoOperand ?) >>
#token 'ADCB' "[aA][dD][cC][bB]"
<< (GotoOperand ?) >>
#token 'ADCD' "[aA][dD][cC][dD]"
<< (GotoOperand ?) >>
#token 'ADD' "[aA][dD][dD]"
<< (GotoABregister ?) >>
#token 'ADDA' "[aA][dD][dD][aA]"
<< (GotoOperand ?) >>
#token 'ADDB' "[aA][dD][dD][bB]"
<< (GotoOperand ?) >>
#token 'ADDD' "[aA][dD][dD][dD]"
<< (GotoOperand ?) >>
#token 'AND' "[aA][nN][dD]"
<< (GotoABregister ?) >>
#token 'ANDA' "[aA][nN][dD][aA]"
<< (GotoOperand ?) >>
#token 'ANDB' "[aA][nN][dD][bB]"
<< (GotoOperand ?) >>
#token 'ANDCC' "[aA][nN][dD][cC][cC]"
<< (GotoRegister ?) >>
...[long list of opcodes snipped]
#token IDENTIFIER [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
(GotoOperandField ?)
>>
#token '#' "\#" -- special constant introduction (FDB)
<< (GotoDataField ?) >>
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token NUMBER [NATURAL] "\% <binary_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
(GotoOperandField ?)
>>
#token CHARACTER [CHARACTER] "<squoted_character>"
<< (= ?:Lexeme:Literal:Character:Value (TokenStringCharacter ? 2))
(= ?:Lexeme:Literal:Character:Format (LiteralFormat:MakeCompactCharacterLiteralFormat 0 0)) ; nothing special about character
(GotoOperandField ?)
>>
%%OperandField
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
-- Tokens signalling switch to index register modes
#token ',' "\,"
<<(GotoRegisterField ?)>>
#token '[' "\["
<<(GotoRegisterField ?)>>
-- Operators for arithmetic syntax
#token '!!' "\!\!"
#token '!' "\!"
#token '##' "\#\#"
#token '#' "\#"
#token '&' "\&"
#token '(' "\("
#token ')' "\)"
#token '*' "\*"
#token '+' "\+"
#token '-' "\-"
#token '/' "\/"
#token '//' "\/\/"
#token '<' "\<"
#token '<' "\<"
#token '<<' "\<\<"
#token '<=' "\<\="
#token '</' "\<\/"
#token '=' "\="
#token '>' "\>"
#token '>' "\>"
#token '>=' "\>\="
#token '>>' "\>\>"
#token '>/' "\>\/"
#token '\\' "\\"
#token '|' "\|"
#token '||' "\|\|"
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
#token NUMBER [NATURAL] "\$ <hexadecimal_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertHexadecimalTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
#token NUMBER [NATURAL] "\% <binary_digit>+"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertBinaryTokenStringToNatural (. format) ? 1 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
-- Notice that an apparent register is accepted as a label in this mode
#token IDENTIFIER [STRING] "<identifier>"
<< (local (;; (= [TokenScan natural] 1) ; process all string characters
(= [TokenLength natural] ?:TokenCharacterCount)=
(= [TokenString (reference TokenBodyT)] (. ?:TokenCharacters))
(= [Result (reference string)] (. ?:Lexeme:Literal:String:Value))
[ThisCharacterCode natural]
(define Ordinala #61)
(define Ordinalf #66)
(define OrdinalA #41)
(define OrdinalF #46)
);;
(;; (= (@ Result) `') ; start with empty string
(while (<= TokenScan TokenLength)
(;; (= ThisCharacterCode (coerce natural TokenString:TokenScan))
(+= TokenScan) ; bump past character
(ifthen (>= ThisCharacterCode Ordinala)
(-= ThisCharacterCode #20) ; fold to upper case
)ifthen
(= (@ Result) (append (@ Result) (coerce character ThisCharacterCode)))=
);;
)while
);;
)local
(= ?:Lexeme:Literal:String:Format (LiteralFormat:MakeCompactStringLiteralFormat 0)) ; nothing interesting in string
>>
%%Register -- operand field for TFR, ANDCC, ORCC, EXG opcodes
#skip "<hblanks>"
#ifnotoken << (GotoRegisterField ?) >>
%%RegisterField -- handles registers and indexing mode syntax
-- In this mode, names that look like registers are recognized as registers
#skip "<hblanks>"
<< (GotoEOLComment ?) >>
#ifnotoken
<< (GotoEOLComment ?) >>
#token '[' "\["
#token ']' "\]"
#token '--' "\-\-"
#token '++' "\+\+"
#token 'A' "[aA]"
#token 'B' "[bB]"
#token 'CC' "[cC][cC]"
#token 'DP' "[dD][pP] | [dD][pP][rR]" -- DPR shouldnt be needed, but found one instance
#token 'D' "[dD]"
#token 'Z' "[zZ]"
-- Index register designations
#token 'X' "[xX]"
#token 'Y' "[yY]"
#token 'U' "[uU]"
#token 'S' "[sS]"
#token 'PCR' "[pP][cC][rR]"
#token 'PC' "[pP][cC]"
#token ',' "\,"
-- Operators for arithmetic syntax
#token '!!' "\!\!"
#token '!' "\!"
#token '##' "\#\#"
#token '#' "\#"
#token '&' "\&"
#token '(' "\("
#token ')' "\)"
#token '*' "\*"
#token '+' "\+"
#token '-' "\-"
#token '/' "\/"
#token '<' "\<"
#token '<' "\<"
#token '<<' "\<\<"
#token '<=' "\<\="
#token '<|' "\<\|"
#token '=' "\="
#token '>' "\>"
#token '>' "\>"
#token '>=' "\>\="
#token '>>' "\>\>"
#token '>|' "\>\|"
#token '\\' "\\"
#token '|' "\|"
#token '||' "\|\|"
#token NUMBER [NATURAL] "<decimal_number>"
<< (local [format LiteralFormat:NaturalLiteralFormat]
(;; (= ?:Lexeme:Literal:Natural:Value (ConvertDecimalTokenStringToNatural (. format) ? 0 0))
(= ?:Lexeme:Literal:Natural:Format (LiteralFormat:MakeCompactNaturalLiteralFormat format))
);;
)local
>>
... [snip]
%% -- end M6809.lex
**************** PARSER **************
-- M6809.ATG: Motorola 6809 assembly code parser
-- (C) Copyright 1989;1999-2002 Ira D. Baxter; All Rights Reserved
m6809 = sourcelines ;
sourcelines = ;
sourcelines = sourcelines sourceline EOL ;
<<PrettyPrinter>>: { V(CV(sourcelines[1]),H(sourceline,A<eol>(EOL))); }
-- leading opcode field symbol should be treated as keyword.
sourceline = ;
sourceline = labels ;
sourceline = optional_labels 'EQU' expression ;
<<PrettyPrinter>>: { H(optional_labels,A<opcode>('EQU'),A<operand>(expression)); }
sourceline = LABEL 'SET' expression ;
<<PrettyPrinter>>: { H(A<firstlabel>(LABEL),A<opcode>('SET'),A<operand>(expression)); }
sourceline = optional_label instruction ;
<<PrettyPrinter>>: { H(optional_label,instruction); }
sourceline = optional_label optlabelleddirective ;
<<PrettyPrinter>>: { H(optional_label,optlabelleddirective); }
sourceline = optional_label implicitdatadirective ;
<<PrettyPrinter>>: { H(optional_label,implicitdatadirective); }
sourceline = unlabelleddirective ;
sourceline = '?ERROR' ;
<<PrettyPrinter>>: { A<opcode>('?ERROR'); }
optional_label = labels ;
optional_label = LABEL ':' ;
<<PrettyPrinter>>: { H(A<firstlabel>(LABEL),':'); }
optional_label = ;
optional_labels = ;
optional_labels = labels ;
labels = LABEL ;
<<PrettyPrinter>>: { A<firstlabel>(LABEL); }
labels = labels ',' LABEL ;
<<PrettyPrinter>>: { H(labels[1],',',A<otherlabels>(LABEL)); }
unlabelleddirective = 'END' ;
<<PrettyPrinter>>: { A<opcode>('END'); }
unlabelleddirective = 'END' expression ;
<<PrettyPrinter>>: { H(A<opcode>('END'),A<operand>(expression)); }
unlabelleddirective = 'IF' expression EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFDEF' IDENTIFIER EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IFDEF'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'IFUND' IDENTIFIER EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('IFUND'),H(A<operand>(IDENTIFIER),A<eol>(EOL))),CV(conditional)); }
unlabelleddirective = 'INCLUDE' FILENAME ;
<<PrettyPrinter>>: { H(A<opcode>('INCLUDE'),A<operand>(FILENAME)); }
unlabelleddirective = 'LIST' expression ;
<<PrettyPrinter>>: { H(A<opcode>('LIST'),A<operand>(expression)); }
unlabelleddirective = 'NAME' IDENTIFIER ;
<<PrettyPrinter>>: { H(A<opcode>('NAME'),A<operand>(IDENTIFIER)); }
unlabelleddirective = 'ORG' expression ;
<<PrettyPrinter>>: { H(A<opcode>('ORG'),A<operand>(expression)); }
unlabelleddirective = 'PAGE' ;
<<PrettyPrinter>>: { A<opcode>('PAGE'); }
unlabelleddirective = 'PAGE' HEADING ;
<<PrettyPrinter>>: { H(A<opcode>('PAGE'),A<operand>(HEADING)); }
unlabelleddirective = 'PCA' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PCA'),A<operand>(expression)); }
unlabelleddirective = 'PCC' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PCC'),A<operand>(expression)); }
unlabelleddirective = 'PSR' expression ;
<<PrettyPrinter>>: { H(A<opcode>('PSR'),A<operand>(expression)); }
unlabelleddirective = 'TABS' numberlist ;
<<PrettyPrinter>>: { H(A<opcode>('TABS'),A<operand>(numberlist)); }
unlabelleddirective = 'TITLE' HEADING ;
<<PrettyPrinter>>: { H(A<opcode>('TITLE'),A<operand>(HEADING)); }
unlabelleddirective = 'WITH' settings ;
<<PrettyPrinter>>: { H(A<opcode>('WITH'),A<operand>(settings)); }
settings = setting ;
settings = settings ',' setting ;
<<PrettyPrinter>>: { H*; }
setting = 'WI' '=' NUMBER ;
<<PrettyPrinter>>: { H*; }
setting = 'DE' '=' NUMBER ;
<<PrettyPrinter>>: { H*; }
setting = 'M6800' ;
setting = 'M6801' ;
setting = 'M6809' ;
setting = 'M6811' ;
-- collects lines of conditional code into blocks
conditional = 'ELSEIF' expression EOL conditional ;
<<PrettyPrinter>>: { V(H(A<opcode>('ELSEIF'),H(A<operand>(expression),A<eol>(EOL))),CV(conditional[1])); }
conditional = 'ELSE' EOL else ;
<<PrettyPrinter>>: { V(H(A<opcode>('ELSE'),A<eol>(EOL)),CV(else)); }
conditional = 'FIN' ;
<<PrettyPrinter>>: { A<opcode>('FIN'); }
conditional = sourceline EOL conditional ;
<<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(conditional[1])); }
else = 'FIN' ;
<<PrettyPrinter>>: { A<opcode>('FIN'); }
else = sourceline EOL else ;
<<PrettyPrinter>>: { V(H(sourceline,A<eol>(EOL)),CV(else[1])); }
-- keyword-less directive, generates data tables
implicitdatadirective = implicitdatadirective ',' implicitdataitem ;
<<PrettyPrinter>>: { H*; }
implicitdatadirective = implicitdataitem ;
implicitdataitem = '#' expression ;
<<PrettyPrinter>>: { A<operand>(H('#',expression)); }
implicitdataitem = '+' expression ;
<<PrettyPrinter>>: { A<operand>(H('+',expression)); }
implicitdataitem = '-' expression ;
<<PrettyPrinter>>: { A<operand>(H('-',expression)); }
implicitdataitem = expression ;
<<PrettyPrinter>>: { A<operand>(expression); }
implicitdataitem = STRING ;
<<PrettyPrinter>>: { A<operand>(STRING); }
-- instructions valid for m680C (see Software Dynamics ASM manual)
instruction = 'ABA' ;
<<PrettyPrinter>>: { A<opcode>('ABA'); }
instruction = 'ABX' ;
<<PrettyPrinter>>: { A<opcode>('ABX'); }
instruction = 'ADC' 'A' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADC','A')),A<operand>(operandfetch)); }
instruction = 'ADC' 'B' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADC','B')),A<operand>(operandfetch)); }
instruction = 'ADCA' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCA'),A<operand>(operandfetch)); }
instruction = 'ADCB' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCB'),A<operand>(operandfetch)); }
instruction = 'ADCD' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADCD'),A<operand>(operandfetch)); }
instruction = 'ADD' 'A' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADD','A')),A<operand>(operandfetch)); }
instruction = 'ADD' 'B' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>(H('ADD','B')),A<operand>(operandfetch)); }
instruction = 'ADDA' operandfetch ;
<<PrettyPrinter>>: { H(A<opcode>('ADDA'),A<operand>(operandfetch)); }
[..snip...]
-- condition code mask for ANDCC and ORCC
conditionmask = '#' expression ;
<<PrettyPrinter>>: { H*; }
conditionmask = expression ;
target = expression ;
operandfetch = '#' expression ; --immediate
<<PrettyPrinter>>: { H*; }
operandfetch = memoryreference ;
operandstore = memoryreference ;
memoryreference = '[' indexedreference ']' ;
<<PrettyPrinter>>: { H*; }
memoryreference = indexedreference ;
indexedreference = offset ;
indexedreference = offset ',' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' '--' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' '-' indexregister ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '++' ;
<<PrettyPrinter>>: { H*; }
indexedreference = ',' indexregister '+' ;
<<PrettyPrinter>>: { H*; }
offset = '>' expression ; -- page zero ref
<<PrettyPrinter>>: { H*; }
offset = '<' expression ; -- long reference
<<PrettyPrinter>>: { H*; }
offset = expression ;
offset = 'A' ;
offset = 'B' ;
offset = 'D' ;
registerlist = registername ;
registerlist = registerlist ',' registername ;
<<PrettyPrinter>>: { H*; }
registername = 'A' ;
registername = 'B' ;
registername = 'CC' ;
registername = 'DP' ;
registername = 'D' ;
registername = 'Z' ;
registername = indexregister ;
indexregister = 'X' ;
indexregister = 'Y' ;
indexregister = 'U' ; -- not legal on M6811
indexregister = 'S' ;
indexregister = 'PCR' ;
indexregister = 'PC' ;
expression = sum '=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<<' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '</' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '<' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>>' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>/' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>=' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '>' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum '#' sum ;
<<PrettyPrinter>>: { H*; }
expression = sum ;
sum = product ;
sum = sum '+' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '-' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '!' product ;
<<PrettyPrinter>>: { H*; }
sum = sum '!!' product ;
<<PrettyPrinter>>: { H*; }
product = term '*' product ;
<<PrettyPrinter>>: { H*; }
product = term '||' product ; -- wrong?
<<PrettyPrinter>>: { H*; }
product = term '/' product ;
<<PrettyPrinter>>: { H*; }
product = term '//' product ;
<<PrettyPrinter>>: { H*; }
product = term '&' product ;
<<PrettyPrinter>>: { H*; }
product = term '##' product ;
<<PrettyPrinter>>: { H*; }
product = term ;
term = '+' term ;
<<PrettyPrinter>>: { H*; }
term = '-' term ;
<<PrettyPrinter>>: { H*; }
term = '\\' term ; -- complement
<<PrettyPrinter>>: { H*; }
term = '&' term ; -- not
term = IDENTIFIER ;
term = NUMBER ;
term = CHARACTER ;
term = '*' ;
term = '(' expression ')' ;
<<PrettyPrinter>>: { H*; }
numberlist = NUMBER ;
numberlist = numberlist ',' NUMBER ;
<<PrettyPrinter>>: { H*; }
BNF更一般用于例如Pascal,C ++,或从Algo1系列(包括现代语言,如C#)衍生真的什么结构,嵌套的语言。 如果我实现一个汇编程序,我可能会使用一些简单的正则表达式模式匹配的操作码和操作数。 它是因为我已经用Z80汇编语言一段时间,但你可能会使用这样的:
/\s*(\w{2,3})\s+((\w+)(,\w+)?)?/
这将匹配其由两个或三个字母的操作码,接着由逗号分隔的一个或两个操作数中的任一行。 提取组装线这样后,你会看着操作码和生成正确字节的指令,包括适用的操作数的值。
我上面使用正则表达式解析器列出的类型将被称为一个“特设”的解析器,这基本上意味着你分割和审查某种块基础(在汇编语言的情况下,通过文本行)的输入。
我不认为你需要overthink它。 有没有点制作解析器,除了需要“LD A,A”成负荷运行,目的和源寄存器时,你可以串整个事情(模案件和空格)直接匹配到一个操作码。
有没有那么多的操作码,和他们没有安排在这样一种方式,你真的从解析和理解IMO汇编得到多少好处。 显然,你需要为字节/地址/索引参数的解析器,但除此之外,我只希望有一个一对一的查找。