I'm trying to pull all sentences from a text that consist of, say, at least 5 words in PHP. Assuming sentences end with full stop, question or exclamation mark, I came up with this:
/[\w]{5,*}[\.|\?|\!]/
Any ideas, what's wrong?
Also, what needs to be done for this to work with UTF-8?
The without regex method:
This outputs:
\w
only matches a single character. A single word would be\w+
. If you need at least 5 words, you could do something like:i.e. at least 4 words followed by spaces, followed by another word followed by a sentence delimiter.
I agree with the solution posted here. If you're using preg functions in PHP you can add 'u' pattern modifier for this to work with UTF-8.
/(\w+\s){4,}\w+[.?!]/u
for example