here is sample of the text file I am working with:
<Opera>
Tristan/NNP
and/CC
Isolde/NNP
and/CC
the/DT
fatalistic/NN
horns/VBZ
The/DT
passionate/JJ
violins/NN
And/CC
ominous/JJ
clarinet/NN
;/:
The capital letters after the forward slashes are weird tags. I want to be able to search the file for something like "NNP,CC,NNP"
and have the program return for this segment "Tristan and Isolde"
, the three words in a row that match those three tags in a row.
The problem I am having is I want the search string to be user inputed so it will always be different.
I can read the file and find one match but I do not know how to count backwards from that point to print the first word or how to find whether the next tag matches.
Similarly, you can do what you need.
EDIT: More generalized.
Build a regular expression dynamically from a list of tags you want to search:
It appears your source text was possibly produced by Natural Language Toolkit (nltk).
Using nltk, you could tokenize the text, split the token into (word, part_of_speech) tuples, and iterate through ngrams to find those that match the pattern:
yields
Related link: