Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence.
The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP?
EDIT: I want something similar to what the stanford parser does: Given a sentence "I shot an elephant in my sleep", it should return something like:
nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)
If you want to be serious about dependance parsing don't use the NLTK, all the algorithms are dated, and slow. Try something like this: https://spacy.io/
We can use Stanford Parser from NLTK.
Requirements
You need to download two things from their website:
Warning!
Make sure that your language model version matches your Stanford CoreNLP parser version!
The current CoreNLP version as of May 22, 2018 is 3.9.1.
After downloading the two files, extract the zip file anywhere you like.
Python Code
Next, load the model and use it through NLTK
Output
The output of the last line is:
I think this is what you want.
If you need better performance, then spacy (https://spacy.io/) is the best choice. Usage is very simple:
You'll get a dependency tree as output, and you can dig out very easily every information you need. You can also define your own custom pipelines. See more on their website.
https://spacy.io/docs/usage/
From the Stanford Parser documentation: "the dependencies can be obtained using our software [...] on phrase-structure trees using the EnglishGrammaticalStructure class available in the parser package." http://nlp.stanford.edu/software/stanford-dependencies.shtml
The dependencies manual also mentions: "Or our conversion tool can convert the output of other constituency parsers to the Stanford Dependencies representation." http://nlp.stanford.edu/software/dependencies_manual.pdf
Neither functionality seem to be implemented in NLTK currently.
To use Stanford Parser from NLTK
1) Run CoreNLP Server at localhost
Download Stanford CoreNLP here (and also model file for your language). The server can be started by running the following command (more details here)
or by NLTK API (need to configure the
CORENLP_HOME
environment variable first)2) Call the dependency parser from NLTK
See detail documentation here, also this question NLTK CoreNLPDependencyParser: Failed to establish connection.
I think you could use a corpus-based dependency parser instead of the grammar-based one NLTK provides.
Doing corpus-based dependency parsing on a even a small amount of text in Python is not ideal performance-wise. So in NLTK they do provide a wrapper to MaltParser, a corpus based dependency parser.
You might find this other question about RDF representation of sentences relevant.