I am trying to define lexer rules for PostgreSQL SQL.
The problem is with the operator definition and the line comments conflicting with each other.
for example @---
is an operator token @-
followed by the --
comment and not an operator token @---
In grako
it would be possible to define the negative lookahead for the -
fragment like:
OP_MINUS: '-' ! ( '-' ) .
In ANTLR4 I could not find any way to rollback already consumed fragment.
Any ideas?
Here the original definition what the PostgreSQL operator can be:
The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:
+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.
A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:
~ ! @ # % ^ & | ` ?
For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.
You can use a semantic predicate in your lexer rules to perform lookahead (or behind) without consuming characters. For example, the following covers several rules for an operator.
However, the above rule does not address the restrictions on including a
+
or-
at the end of an operator. To handle that in the easiest way possible, I would probably separate the two cases into separate rules.