Extract sentences from HTML in PHP [duplicate]

2019-08-04 08:11发布

问题:

This question already has an answer here:

  • How do you parse and process HTML/XML in PHP? 30 answers

I'm doing a PHP project (using Codeigniter) on text summarization and for that I need to extract sentences from content of a Rich TextBox (this content includes tags). Therefore is there a proper method or Codeigniter library to extract sentences from a content containing HTML tags?

回答1:

A php function strip_tags() should help you. It returns string without php and html tags. If you just need to count sentences, you could do count(explode(". ", $text)) The delimiter is a typical end of a sentence.

Plain simple and limited, but doesn't require any libraries.



回答2:

This technique is called as web-scraping

Have a look at this