php sentence boundaries detection

2019-01-04 10:48发布

I would like to divide a text into sentences in PHP. I'm currently using a regex, which brings ~95% accuracy and would like to improve by using a better approach. I've seen NLP tools that do that in Perl, Java, and C but didn't see anything that fits PHP. Do you know of such a tool?

7条回答
The star\"
2楼-- · 2019-01-04 11:53

As a low-tech approach, you might want to consider using a series of explode calls in a loop, using ., !, and ? as your needle. This would be very memory and processor intensive (as most text processing is). You would have a bunch of temporary arrays and one master array with all found sentences numerically indexed in the right order.

Also, you'd have to check for common exceptions (such as a . in titles like Mr. and Dr.), but with everything being in an array, these types of checks shouldn't be that bad.

I'm not sure if this is any better than regex in terms of speed and scaling, but it would be worth a shot. How big are these blocks of text you want to break into sentences?

查看更多
登录 后发表回答