I use TinyMCE to allow minimal formatting of text within my site. From the HTML that's produced, I'd like to convert it to plain text for e-mail. I've been using a class called html2text, but it's really lacking in UTF-8 support, among other things. I do, however, like that it maps certain HTML tags to plain text formatting — like putting underscores around text that previously had <i> tags in the HTML.
Does anyone use a similar approach to converting HTML to plain text in PHP? And if so: Do you recommend any third-party classes that I can use? Or how do you best tackle this issue?
$text = "string 1<br>string 2<br/><ul><li>string 3</li><li>string 4</li></ul><p>string 5</p>";
echo planText($text);
output
string 1
string 2
string 3
string 4
string 5
Use html2text (example HTML to text), licensed under the Eclipse Public License. It uses PHP's DOM methods to load from HTML, and then iterates over the resulting DOM to extract plain text. Usage:
Although incomplete, it is open source and contributions are welcome.
Issues with other conversion scripts:
I have just found a PHP function "strip_tags()" and its working in my case.
I tried to convert the following HTML :
After applying strip_tags() function, I have got the following output :
I didn't find any of the existing solutions fitting - simple HTML emails to simple plain text files.
I've opened up this repository, hope it helps someone. MIT license, by the way :)
https://github.com/RobQuistNL/SimpleHtmlToText
Example:
returns:
Markdownify worked wonderful for me! what have to be mentioned about it: it supports perfectly utf-8, what was the main reason why i was searching for another solution than html2text (what was mentioned earlier in this thread).
I came around the same problem as the OP, and trying some solutions from the top answers above didn't prove to work for my scenarios. See why at the end.
Instead, I found this helpful script, to avoid confusion let's call it
html2text_roundcube
, available under GPL:It's actually an updated version of an already mentioned script -
http://www.chuggnutt.com/html2text.php
- updated by RoundCube mail.Usage:
Why
html2text_roundcube
proved better than the others:Script
http://www.chuggnutt.com/html2text.php
didn't work out of the box for cases with special HTML codes/names (egä
), or unpaired quotes (eg<p>25" Monitor</p>
).Script
https://github.com/soundasleep/html2text
had no option to hide or group the links at the end of the text, making a usual HTML page look bloated with links when in text-plain format; customizing the code for special treatment of how the transformation is done is not as straight forward as simply editing an array inhtml2text_roundcube
.