I'm trying to use a DCG to split a string into two parts separated by spaces. E.g. 'abc def' should give me back "abc" & "def". The program & DCG are below.
main:-
prompt(_, ''),
repeat,
read_line_to_codes(current_input, Codes),
(
Codes = end_of_file
->
true
;
processData(Codes),
fail
).
processData(Codes):-
(
phrase(data(Part1, Part2), Codes)
->
format('~s, ~s\n', [ Part1, Part2 ])
;
format('Didn''t recognize data.\n')
).
data([ P1 | Part1 ], [ P2 | Part2 ]) --> [ P1 | Part1 ], spaces(_), [ P2 | Part2 ].
spaces([ S | S1 ]) --> [ S ], { code_type(S, space) }, (spaces(S1); "").
This works correctly. But I found that having to type [ P1 | Part1 ]
& [ P2 | Part2 ]
was really verbose. So, I tried replacing all instances of [ P1 | Part1 ]
w/ Part1
& likewise w/ [ P2 | Part2 ]
in the definition of data
, i.e. the following.
data(Part1, Part2) --> Part1, spaces(_), Part2.
That's much easier to type, but that gave me an Arguments are not sufficiently instantiated
error. So it looks like an unbound variable isn't automatically interpreted as a list of codes in a DCG. Is there any other way to make this less verbose? My intent is to use DCG's where I would use regular expressions in other programming languages.
Your intuition is correct; the term-expansion procedure for DCGs (at least in SWI-Prolog, but should apply to others) with your modified version of
data
gives the following:As you can see, the variable
Part1
andPart2
parts of your DCG rule have been interpreted into calls tophrase/3
again, and not lists; you need to explicitly specify that they are lists for them to be treated as such.I can suggest an alternative version which is more general. Consider the following bunch of DCG rules:
Take a look at the first clause at the top; the
data
rule now attempts to match 0-to-many spaces (as many as possible, because of the cut), then one-to-many non-space characters to construct an atom (A
) from the codes, then 0-to-many spaces again, then recurses to find more atoms in the string (As
). What you end up with is a list of atoms which appeared in the input string without any spaces. You can incorporate this version into your code with the following:This version breaks a string apart with any number of spaces between words, even if they appear at the start and end of the string.