One of my assignments ask us to build a prolog tokenizer. Right now I wrote a predicate that can change space and tab it new line. But I don't know how to implement that into the main program.
The replace part looks like this:
replace(_, _, [], []).
replace(O, R, [O|T], [R|T2]):- replace(O, R, T, T2).
replace(O, R, [H|T], [H|T2]) :- H \= O, replace(O, R, T, T2).
And the Main
part has a predicate called removewhite(list1 list2)
So how can I let removewhite
execute replace?
You are a bit 'off trail' toward a tokenizer: removewhite/2 isn't going to buy you any useful functionality. Instead, consider a DCG (of course if your Prolog offers this functionality):
tokenize(String, Tokens) :- phrase(tokenize(Tokens), String).
tokenize([]) --> [].
tokenize(Tokens) --> skip_spaces, tokenize(Tokens).
tokenize([Number|Tokens]) --> number(Number), tokenize(Tokens).
skip_spaces --> code_types(white, [_|_]).
number(N) --> code_types(digit, [C|Cs]), {number_codes(N,[C|Cs])}.
code_types(Type, [C|Cs]) --> [C], {code_type(C,Type)}, !, code_types(Type, Cs).
code_types(_, []) --> [].
despite the simplicity, this is a fairly efficient scanner, easily extensible.
In SWI-Prolog, that has (non ISO compliant) extensions for efficient handling of strings, this can be called from top level like:
?- tokenize(`123 4 567 `, L).
L = [123, 4, 567]
or
?- atom_codes('123 4 567 ',Cs), tokenize(Cs, L).
Cs = [49, 50, 51, 32, 32, 52, 32, 53, 54|...],
L = [123, 4, 567]
Btw, in SWI-Prolog, number//1 is predefined (with much more functionality, of course) in library(dcg/basics).
Anyway, about your question
how can I let removewhite execute replace?
I feel you're really 'barking the wrong tree': removing a space - that actually is a separator - will mess up your input...
You can write a more "powerfull" predicate
replace_all(_, _, [], []).
replace_all(L, R, [X|T], [R|T2]):-
member(X, L),
replace_all(L, R, T, T2).
replace_all(L, R, [X|T], [X|T2]) :-
\+ member(X, L),
replace_all(L, R, T, T2).
Then, you will have
removewhite(List1, List2) :-
remove_all([' ', '\t'], '\n', List1, List2).