So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring)
.
But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?
Or maybe there's a better logic ?
You can do that using a process in 2 steps.
1st you fragment the phrases and then filter each one has the word.
something like this:
You can get substrings between sentence finishers (dot/exclamation mark/qustion mark/etc) and search for the word in each sentence inside a loop.
Then return the substring when you find the matching word.
If your boundaries are e.g.
.
,!
,?
and;
, match all sentences across[^.!?;]*(wordmatch)[^.!?;]*
expression. It will give all sentences with desired wordmatch inside.Example:
Extract the sentances from the input. Then search for the specified word(s) within each sentance. Return the sentances where the word(s) is present.
Once you have a position, you would then read up to the next
.
, or end of the file.. but you also need to read backwards from the beginning of the word to a.
or the beginning of the file. Those two positions mean you can then extract the sentence.Note, it's not fool-proof... in its simplest form as outlined above
e.g.
would mean the sentence started after theg.
which is not probably the case.