I am working on a text search project, and using text blob to search for sentences from text.
TextBlob pulls all the sentences with the keywords efficiently. However for effective research i also want to pull out one sentence before and one after which I am unable to figure.
Below is the code I am using:
def extraxt_sents(Text,word):
search_words = set(word.split(','))
sents = ''.join([s.lower() for s in Text])
blob = TextBlob(sents)
matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
print search_words
print(matches)
If you want to get the lines before and after the match, you can either create a loop and memorize the previous line, or use slices, like [from:to]
on the blob.sentences
list.
The best way might be to use the enumerate
bultin function.
match_region = [map(str, blob.sentences[i-1:i+2]) # from prev to after next
for i, s in enumerate(blob.sentences) # i is index, e is element
if search_words & set(s.words)] # same as your condition
Here, blob.sentences[i-1:i+2]
will extract the sublist spanning from index i-1
(inclusive) to index i+2
(exclusive), and map
turns the elements in this list into strings.
Note: Actually, you might want to replace i-1
with max(0, i-1)
; otherwise i-1
could be -1
and Python would interpret this as the last element, yielding an empty slice. If i+2
is higher than the list's length, on the other hand, this will not be a problem.