I'm trying to create a sentence parser in Prolog. I want the sentence to be parsed into three separate lists that will be matched with a suggested outcome.
For example, here's the code I have come up with so far...
This is the vocabulary which will be used to parse certain words:
det(the).
det(a).
det(an).
noun(cat).
noun(boy).
noun(girl).
noun(grandfather).
noun(person).
noun(mat).
noun(party).
noun(book).
noun(problem).
noun(father).
noun(student).
noun(golf).
noun(conversation).
noun(challenge).
noun(person).
noun(problem).
noun(sat).
verb(thrives).
verb(loves).
verb(likes).
prep(on).
prep(with).
adj(big).
adj(lonely).
adj(elderly).
adj(teenage).
adj(good).
adj(fat).
sentence(Sentence,sentence(Noun_Phrase,Verb_Phrase)):-
np(Sentence,Noun_Phrase,Rem),
vp(Rem,Verb_Phrase).
Code for parsing:
np([X|T],np(det(X),NP2),Rem):- /* Det NP2 */
det(X),
np2(T,NP2,Rem).
np(Sentence,Parse,Rem):- np2(Sentence,Parse,Rem). /* NP2 */
np(Sentence,np(NP,PP),Rem):- /* NP PP */
np(Sentence,NP,Rem1),
pp(Rem1,PP,Rem).
np2([H|T],np2(noun(H)),T):- noun(H). /* Noun */
np2([H|T],np2(adj(H),Rest),Rem):- adj(H),np2(T,Rest,Rem).
pp([H|T],pp(prep(H),Parse),Rem):- /* PP NP */
prep(H),
np(T,Parse,Rem).
vp([H| []], vp(verb(H))):- /* Verb */
verb(H).
vp([H|T], vp(verb(H), Rem)):- /* VP PP */
vp(H, Rem),
pp(T, Rem, _).
vp([H|T], vp(verb(H), Rem)):- /* Verb NP */
verb(H),
np(T, Rem, _).
I then want to get the output from this with something like this...
?- sentence([a,very,young,boy,loves,a,manual,problem],Parse).
And then somehow suggest a present out of a fact like this...
present(construction_kit, subject(very,young,boy), verb(loves), object(manual,problem)).
present(a_golfing_sweater, subject(young,father), verb(loves), object(golf)).
As you can probably see, I would want to match this with the 'construction_kit' present.
Let's try and trim this down to a small problem so we can see how we might approach it. Let's start with a simpler grammar.
sentence(NP, VP) --> noun_phrase(NP), verb_phrase(VP).
noun_phrase(np(Noun, Adj)) --> det, adjectives(Adj), noun(Noun).
det --> [D], { det(D) }.
det --> [].
noun(N) --> [N], { noun(N) }.
adjectives([]) --> [].
adjectives([A|As]) --> adjective(A), adjectives(As).
adjective(A) --> [A], { adj(A) }.
verb_phrase(vp(Verb, Noun)) --> verb(Verb), noun_phrase(Noun).
verb(V) --> [V], { verb(V) }.
This is not a complete grammar even for your problem, but it will parse some sentences:
?- phrase(sentence(NP,VP), [a,teenage,boy,loves,a,big,problem]).
NP = np(boy, [teenage]),
VP = vp(loves, np(problem, [big]))
Now we can make a present database. I'm just going to consider the "head" of the noun phrase for now.
present('construction kit', boy, loves, problem).
Now you can deploy a very simple query to try and match them up:
?- phrase(sentence(np(Noun,_), vp(Verb, np(Object, _))),
[a,teenage,boy,loves,a,big,problem]),
present(Suggestion, Noun, Verb, Object).
Noun = boy,
Verb = loves,
Object = problem,
Suggestion = 'construction kit' ;
Now you can see immediately your biggest problem: if you try to match a more complex grammar, you're going to have to have a corresponding increase in the complexity of your query which turns that into a suggestion. The solution to this kind of problem, where two things appear to increase in complexity in concert, is always to create an abstraction between them, but that is out of scope for this question.
Edit It's worth noting that if you want to build a general recommendation system, this is not a modern approach. I would recommend the book Programming Collective Intelligence for a quick overview of several approaches, backed by fairly readable Python.
Edit Is it clear that making the grammar more complex will result in more complexity in matching presents?
For starters, I have omitted entirely your prepositional phrase production. I just didn't include that. If it were there, would it have bearing on the suggestion? Consider a sentence like "a teenage boy who loves to code loves a big problem"—the natural extension of my noun_phrase//1
to cover this case will miss out on an important detail of this child which might lead you to a different present than the construction set.
Supposing you want to handle that supplemental information, you will have to add more patterns to the first argument to phrase/3
. In fact, you'll probably accept whatever comes out and then dispatch to some kind of suggestion system.
One thing @TomasBy gets right is that we are using a very simple intermediate representation here, which is just (noun,verb,object). But if we make the grammar more complex, we will have to capture and handle more of it in this representation which will raise the complexity of the suggestion system. I don't see how to extend this simple representation to handle prepositional phrases, for instance, unless you just throw them away.
Converting the DCG back to ordinary Prolog
This is straightforward to do, but Prolog will do it for you. You can simply load the DCG above and use listing/0
to get the answer.
| ?- listing.
% file: user
sentence(A, B, C, D) :-
noun_phrase(A, C, E),
verb_phrase(B, E, D).
noun_phrase(np(A, B), C, D) :-
det(C, E),
adjectives(B, E, F),
noun(A, F, D).
det([A|B], C) :-
det(A),
B = C.
det(A, A).
noun(A, [A|B], C) :-
noun(A),
B = C.
adjectives([], A, A).
adjectives([A|B], C, D) :-
adjective(A, C, E),
adjectives(B, E, D).
adjective(A, [A|B], C) :-
adj(A),
B = C.
verb_phrase(vp(A, B), C, D) :-
verb(A, C, E),
noun_phrase(B, E, D).
verb(A, [A|B], C) :-
verb(A),
B = C.