I'm trying to use a DCG to split a string into two parts separated by spaces. E.g. 'abc def' should give me back "abc" & "def". The program & DCG are below.
main:-
prompt(_, ''),
repeat,
read_line_to_codes(current_input, Codes),
(
Codes = end_of_file
->
true
;
processData(Codes),
fail
).
processData(Codes):-
(
phrase(data(Part1, Part2), Codes)
->
format('~s, ~s\n', [ Part1, Part2 ])
;
format('Didn''t recognize data.\n')
).
data([ P1 | Part1 ], [ P2 | Part2 ]) --> [ P1 | Part1 ], spaces(_), [ P2 | Part2 ].
spaces([ S | S1 ]) --> [ S ], { code_type(S, space) }, (spaces(S1); "").
This works correctly. But I found that having to type [ P1 | Part1 ]
& [ P2 | Part2 ]
was really verbose. So, I tried replacing all instances of [ P1 | Part1 ]
w/ Part1
& likewise w/ [ P2 | Part2 ]
in the definition of data
, i.e. the following.
data(Part1, Part2) --> Part1, spaces(_), Part2.
That's much easier to type, but that gave me an Arguments are not sufficiently instantiated
error. So it looks like an unbound variable isn't automatically interpreted as a list of codes in a DCG. Is there any other way to make this less verbose? My intent is to use DCG's where I would use regular expressions in other programming languages.
Your intuition is correct; the term-expansion procedure for DCGs (at least in SWI-Prolog, but should apply to others) with your modified version of data
gives the following:
?- listing(data).
data(A, D, B, F) :-
phrase(A, B, C),
spaces(_, C, E),
phrase(D, E, F).
As you can see, the variable Part1
and Part2
parts of your DCG rule have been interpreted into calls to phrase/3
again, and not lists; you need to explicitly specify that they are lists for them to be treated as such.
I can suggest an alternative version which is more general. Consider the following bunch of DCG rules:
data([A|As]) -->
spaces(_),
chars([X|Xs]),
{atom_codes(A, [X|Xs])},
spaces(_),
data(As).
data([]) --> [].
chars([X|Xs]) --> char(X), !, chars(Xs).
chars([]) --> [].
spaces([X|Xs]) --> space(X), !, spaces(Xs).
spaces([]) --> [].
space(X) --> [X], {code_type(X, space)}.
char(X) --> [X], {\+ code_type(X, space)}.
Take a look at the first clause at the top; the data
rule now attempts to match 0-to-many spaces (as many as possible, because of the cut), then one-to-many non-space characters to construct an atom (A
) from the codes, then 0-to-many spaces again, then recurses to find more atoms in the string (As
). What you end up with is a list of atoms which appeared in the input string without any spaces. You can incorporate this version into your code with the following:
processData(Codes) :-
% convert the list of codes to a list of code lists of words
(phrase(data(AtomList), Codes) ->
% concatenate the atoms into a single one delimited by commas
concat_atom(AtomList, ', ', Atoms),
write_ln(Atoms)
;
format('Didn''t recognize data.\n')
).
This version breaks a string apart with any number of spaces between words, even if they appear at the start and end of the string.