This question already has an answer here:
- Python split text on sentences 9 answers
I wish to split text into sentences. Can anyone help me?
I also need to handle abbreviations. However my plan is to replace these at an earlier stage. Mr. -> Mister
import re
import unittest
class Sentences:
def __init__(self,text):
self.sentences = tuple(re.split("[.!?]\s", text))
class TestSentences(unittest.TestCase):
def testFullStop(self):
self.assertEquals(Sentences("X. X.").sentences, ("X.","X."))
def testQuestion(self):
self.assertEquals(Sentences("X? X?").sentences, ("X?","X?"))
def testExclaimation(self):
self.assertEquals(Sentences("X! X!").sentences, ("X!","X!"))
def testMixed(self):
self.assertEquals(Sentences("X! X? X! X.").sentences, ("X!", "X?", "X!", "X."))
Thanks, Barry
EDIT: To start with, I would be happy to satisfy the four tests I've included above. This would help me understand better how regexs work. For now I can define a sentence as X. etc as defined in my tests.