Removing whitespace from strings in Prolog

2019-07-07 08:06发布

问题:

I wrote parser in Prolog. I haven't finished yet. It is a part of code. The next step is killing all whitespace in string.

parse(Source, Tree) :-  kill_whitespace(Source, CleanInput), % remove whitespaces
                        actual_parse(CleanInput, Tree).

actual_parse(CleanInput, Tree):- phrase(expr(Tree),CleanInput).

expr(Ast) --> term(Ast1), expr_(Ast1,Ast).
expr_(Acc,Ast) --> " + ", !, term(Ast2), expr_(plus(Acc,Ast2), Ast).
expr_(Acc,Ast) --> " - ", !, term(Ast2), expr_(minus(Acc,Ast2), Ast).
expr_(Acc,Acc) --> [].

term(Ast) --> factor(Ast1), term_(Ast1,Ast).
term_(Acc,Ast) --> " * ", !, factor(Ast2), term_(mul(Acc,Ast2),Ast).
term_(Acc,Ast) --> " ** ", !, factor(Ast2), term_(pol(Acc,Ast2),Ast).
term_(Acc,Acc) --> [].

factor(Ast) --> "(", !, expr(Ast), ")".
factor(D)--> [X], { X >= 48 , X=<57 , D is X-48 }.
factor(id(N,E)) --> "x", factor(N), ":=", expr(E), ";".

For example:

?- parse("x2:=4",T).
    T = id(2, 4)

True! But, when I write:

?- parse("x2 := 4",T).
false.

It must be true as well and it should be a filter: kill_whitespace(Source, CleanInput).

Different solutions are inefficient. How can I do that?

回答1:

I usually place a 'skip' non terminal where space can occurs. Such skip usually discards comments as well as any other 'uninteresting' text.

To keep as simpler as possible:

% discard any number of spaces
s --> "" ; " ", s.

I prefer a short name, to keep the grammar clean. To discard newlines etc.. as well:

s --> "" ; (" ";"\t";"\n";"\r"), s.

A 'style' note: instead of

parse(Source, Tree) :-
   expr(Tree, Source, []).

you could consider

parse(Source, Tree) :-
   phrase(expr(Tree), Source).


回答2:

well, the easy way is to parse the string and remove whitespace/keep only non-whispace with a filter predicate. But this requires a second parse.

An other way to fix it is to use your own predicate to "get" characters,
i.e. foo --> "a". becomes foo --> get("a"). where get//1 is something like:

get(X) --> [X].
get(X) --> whitespace, get(X).


回答3:

The usual way of writing a parser is to write it in two stages:

The first stage conducts lexical analysis and produces a stream of tokens. Whitespace and other "tokens" not significant to the parse (e.g., comments) are discarded at this point.

The second stage conducts the parse itself, examining the list of tokens produced by the lexical analyzer.