I have the following two strings with their POS tags:
Sent1: "something like how writer pro or phraseology works would be really cool."
[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer',
'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works',
'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool',
'JJ'), ('.', '.')]
Sent2: "more options like the syntax editor would be nice"
[('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'),
('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'),
('nice', 'JJ')]
I am looking for a way to detect (return True) if there is the sequence: "would" + be" + adjective (regardless of the position of the adjective, as long as its after "would" "be") in these strings. In the second string the adjective, "nice" immediately follows "would be" but that is not the case in the first string.
The trivial case (no other word before the adjective; "would be nice") was solved in an earlier question of mine: detecting POS tag pattern along with specified words
I am now looking for a more general solution where optional words may occur before the adjective. I am new to NLTK and Python.
First install the nltk_cli
as per the instructions: https://github.com/alvations/nltk_cli
Then, here's a secret function in nltk_cli
, maybe you'll find it useful:
alvas@ubi:~/git/nltk_cli$ cat infile.txt
something like how writer pro or phraseology works would be really cool .
more options like the syntax editor would be nice
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+ADJP infile.txt
would be really cool
would be nice
To illustrate other possible usage:
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+VP infile.txt
!!! NO CHUNK of VP+VP in this sentence !!!
!!! NO CHUNK of VP+VP in this sentence !!!
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 NP+VP infile.txt
how writer pro or phraseology works would be
the syntax editor would be
alvas@ubi:~/git/nltk_cli$ python senna.py --chunk2 VP+NP infile.txt
!!! NO CHUNK of VP+NP in this sentence !!!
!!! NO CHUNK of VP+NP in this sentence !!!
Then if you want to check if the phrase in sentence and output True/False, simply read and iterate through the outputs from nltk_cli
and check with if-else
conditions.
Would this help?
s1=[('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')]
flag = True
for i,j in zip(s1[:-1],s1[1:]):
if i[0]+" "+j[0] == "would be":
flag = True
if flag and (i[-1] == "JJ" or j[-1] == "JJ"):
print "would be adjective found in the tagged string"
it seem you would just search consecutive tags for "would" followed by "be" and then for the first instance of tag "JJ". Something like this:
import nltk
def has_would_be_adj(S):
# make pos tags
pos = nltk.pos_tag(S.split())
# Search consecutive tags for "would", "be"
j = None # index of found "would"
for i, (x, y) in enumerate(zip(pos[:-1], pos[1:])):
if x[0] == "would" and y[0] == "be":
j = i
break
if j is None or len(pos) < j + 2:
return False
a = None # index of found adjective
for i, (word, tag) in enumerate(pos[j + 2:]):
if tag == "JJ":
a = i+j+2 #
break
if a is None:
return False
print("Found adjective {} at {}", pos[a], a)
return True
S = "something like how writer pro or phraseology works would be really cool."
print(has_would_be_adj(S))
I'm sure this could be written compacter and cleaner but it does what it says on the box :)
from itertools import tee,izip,dropwhile
import nltk
def check_sentence(S):
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def consecutive_would_be(word_group):
first, second = word_group
(would_word, _) = first
(be_word, _) = second
return would_word.lower() != "would" && be_word.lower() != "be"
for word_groups in dropwhile(consecutive_would_be, pairwise(nltk.pos_tag(nltk.word_tokenize(S))):
first, second = word_groups
(_, pos1) = first
(_, pos2) = second
if pos1 == "JJ" || pos2 == "JJ":
return True
return False
and then you can use the function like so:
S = "more options like the syntax editor would be nice."
check_sentence(S)
Check StackOverflow Link
from nltk.tokenize import word_tokenize
def would_be(tagged):
return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
S = "more options like the syntax editor would be nice."
pos = nltk.pos_tag(word_tokenize(S))
would_be(pos)
Also check code
from nltk.tokenize import word_tokenize
import nltk
def checkTag(S):
pos = nltk.pos_tag(word_tokenize(S))
flag = 0
for tag in pos:
if tag[1] == 'JJ':
flag = 1
if flag:
for ind,tag in enumerate(pos):
if tag[0] == 'would' and pos[ind+1][0] == 'be':
return True
return False
return False
S = "something like how writer pro or phraseology works would be really cool."
print checkTag(S)