I know how to get bigram and trigram collocations using NLTK and I apply them to my own corpora. The code is below.
I'm not sure however about (1) how to get the collocations for a particular word? (2) does NLTK have a collocation metric based on Log-Likelihood Ratio?
import nltk
from nltk.collocations import *
from nltk.tokenize import word_tokenize
text = "this is a foo bar bar black sheep foo bar bar black sheep foo bar bar black sheep shep bar bar black sentence"
trigram_measures = nltk.collocations.TrigramAssocMeasures()
finder = TrigramCollocationFinder.from_words(word_tokenize(text))
for i in finder.score_ngrams(trigram_measures.pmi):
print i
Try this code:
It uses the likelihood measure and also filters out Ngrams that don't contain the word 'creature'
As for question #2, yes! NLTK has the Likelihood-Ratio in its association measure. The first question remains unanswered!
http://nltk.org/api/nltk.metrics.html?highlight=likelihood_ratio#nltk.metrics.association.NgramAssocMeasures.likelihood_ratio
Question 1 - Try:
The idea is to filter out whatever you don't want. This method is normally used to filter out words in specific parts of the ngram, and you can tweak that to your heart's content.