Converting HTML to plain text in PHP for e-mail

2019-01-02 15:28发布

I use TinyMCE to allow minimal formatting of text within my site. From the HTML that's produced, I'd like to convert it to plain text for e-mail. I've been using a class called html2text, but it's really lacking in UTF-8 support, among other things. I do, however, like that it maps certain HTML tags to plain text formatting — like putting underscores around text that previously had <i> tags in the HTML.

Does anyone use a similar approach to converting HTML to plain text in PHP? And if so: Do you recommend any third-party classes that I can use? Or how do you best tackle this issue?

14条回答
呛了眼睛熬了心
2楼-- · 2019-01-02 15:29
public function plainText($text)
{
    $text = strip_tags($text, '<br><p><li>');
    $text = preg_replace ('/<[^>]*>/', PHP_EOL, $text);

    return $text;
}

$text = "string 1<br>string 2<br/><ul><li>string 3</li><li>string 4</li></ul><p>string 5</p>";

echo planText($text);

output
string 1
string 2
string 3
string 4
string 5

查看更多
心情的温度
3楼-- · 2019-01-02 15:32

Use html2text (example HTML to text), licensed under the Eclipse Public License. It uses PHP's DOM methods to load from HTML, and then iterates over the resulting DOM to extract plain text. Usage:

// when installed using the Composer package
$text = Html2Text\Html2Text::convert($html);

// usage when installed using html2text.php
require('html2text.php');
$text = convert_html_to_text($html);

Although incomplete, it is open source and contributions are welcome.

Issues with other conversion scripts:

  • Since html2text (GPL) is not EPL-compatible.
  • lkessler's link (attribution) is incompatible with most open source licenses.
查看更多
余生无你
4楼-- · 2019-01-02 15:33

I have just found a PHP function "strip_tags()" and its working in my case.

I tried to convert the following HTML :

<p><span style="font-family: 'Verdana','sans-serif'; color: black; font-size: 7.5pt;">&nbsp;</span>Many  practitioners are optimistic that the eyeglass and contact lens  industry will recover from the recent economic storm. Did your practice  feel its affects?&nbsp; Statistics show revenue notably declined in 2008 and  2009. But interestingly enough, those that monitor these trends state  that despite the industry's lackluster performance during this time,  revenue has grown at an average annual rate&nbsp;of 2.2% over the last five  years, to $9.0 billion in 2010.&nbsp; So despite the downturn, how were we  able to manage growth as an industry?</p>

After applying strip_tags() function, I have got the following output :

&amp;nbsp;Many  practitioners are optimistic that the eyeglass and contact lens  industry will recover from the recent economic storm. Did your practice  feel its affects?&amp;nbsp; Statistics show revenue notably declined in 2008 and  2009. But interestingly enough, those that monitor these trends state  that despite the industry&#039;s lackluster performance during this time,  revenue has grown at an average annual rate&amp;nbsp;of 2.2% over the last five  years, to $9.0 billion in 2010.&amp;nbsp; So despite the downturn, how were we  able to manage growth as an industry?
查看更多
梦醉为红颜
5楼-- · 2019-01-02 15:34

I didn't find any of the existing solutions fitting - simple HTML emails to simple plain text files.

I've opened up this repository, hope it helps someone. MIT license, by the way :)

https://github.com/RobQuistNL/SimpleHtmlToText

Example:

$myHtml = '<b>This is HTML</b><h1>Header</h1><br/><br/>Newlines';
echo (new Parser())->parseString($myHtml);

returns:

**This is HTML**
### Header ###


Newlines
查看更多
残风、尘缘若梦
6楼-- · 2019-01-02 15:38

Markdownify worked wonderful for me! what have to be mentioned about it: it supports perfectly utf-8, what was the main reason why i was searching for another solution than html2text (what was mentioned earlier in this thread).

查看更多
美炸的是我
7楼-- · 2019-01-02 15:39

I came around the same problem as the OP, and trying some solutions from the top answers above didn't prove to work for my scenarios. See why at the end.

Instead, I found this helpful script, to avoid confusion let's call it html2text_roundcube, available under GPL:

It's actually an updated version of an already mentioned script - http://www.chuggnutt.com/html2text.php - updated by RoundCube mail.

Usage:

$h2t = new \Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');
echo $h2t->getText(); // prints Hello, "WORLD"

Why html2text_roundcube proved better than the others:

  • Script http://www.chuggnutt.com/html2text.php didn't work out of the box for cases with special HTML codes/names (eg &auml;), or unpaired quotes (eg <p>25" Monitor</p>).

  • Script https://github.com/soundasleep/html2text had no option to hide or group the links at the end of the text, making a usual HTML page look bloated with links when in text-plain format; customizing the code for special treatment of how the transformation is done is not as straight forward as simply editing an array in html2text_roundcube.

查看更多
登录 后发表回答