I have a transcript and in order to perform an analysis of each speaker I need to only add their words to a string. The problem I'm having is that each line does not start with the speakers name. Here's a snippet of my text file
BOB: blah blah blah blah
blah hello goodbye etc.
JERRY:.............................................
...............
BOB:blah blah blah
blah blah blah
blah.
I want to collect only the words from the chosen speaker(in this case bob) said and add them to a string and exclude words from jerry and other speakers. Any ideas for this?
edit:There are line breaks between paragraphs and before any new speaker starts.
Every time a speaker starts to speak, keep the current_speaker and decide what to do according to this speaker. Read the lines until the speaker changes.
Using a regex is the best way to go. As you'll be using it multiple times, you can save on a bit of processing by compiling it before using it to match each line.
This outputs the following: