About a Prolog tokenizer

2019-09-08 09:23发布

问题:

One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.

The replace part looks like this:

replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).

And the Main part has a predicate called removewhite(list1 list2)

So how can I let removewhite execute replace?

回答1:

You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):

tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).

tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).

skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.

code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].

despite the simplicity, this is a fairly efficient scanner, easily extensible. In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:

?- tokenize(`123  4 567  `, L).
L = [123, 4, 567]

or

?- atom_codes('123  4 567  ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567] 

Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).

Anyway, about your question

how can I let removewhite execute replace?

I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...



回答2:

You can write a more "powerfull" predicate

replace_all(_, _, [], []).
replace_all(L, R, [X|T], [R|T2]):- 
    member(X, L),
    replace_all(L, R, T, T2).

replace_all(L, R, [X|T], [X|T2]) :- 
    \+ member(X, L),
    replace_all(L, R, T, T2).

Then, you will have

removewhite(List1, List2) :-
    remove_all([' ', '\t'], '\n', List1, List2).


标签: prolog lexer