-->

Howto encode texts outside the
 tag wit

2019-06-11 15:54发布

问题:

I'm trying to make my own BBCode parser for my website and I'm looking for a way to "htmlentities()" except the codes inside PRE tags, and the PRE tag itself.

For example:

<b>Hello world</b> (outputs &lt;b&gt;Hello world&lt;&gt;)
<pre>"This must not be converted to HTML entities"</pre> (outputs <pre>"This must not be converted to HTML entities"</pre>)

I really got no idea on how to do this.

Any kind of help would be appreciated :)

Thanks.

回答1:

You could convert the &lt;pre&gt; … &lt;/pre&gt; back to <pre> … </pre>:

// convert anything
$str = htmlspecialchars($str);
// convert <pre> back
$str = preg_replace('/&lt;pre&gt;((?:[^&]+|&(?!lt;\\/pre&gt;))*)&lt;\\/pre&gt;/s', '<pre>$1</pre>', $str);


回答2:

If it's to practice, ok. But if it's just to get the feature, then don't reinvent the wheel. Parsing is not an easy task, and there are plenty of mature parsers out there. Of course, I would look at the PEAR packages first. Try HTML_BBCodeParser.

If you really want to do it yourself, you got two ways :

  • regexp
  • state machines

Usually a mix of both is handy. But because tags can be nested and badly formed, it's really a hard stuff to code. At least, use a generic parser code and define you lexical fields, from scratch it will take all the time you use to code the web site.

Btw : using a BBparser does not free you from sanitizing the user input...

EDIT : I'm in a good mood today, so here is a snippet on how to use HTML_BBCodeParser :

// if you don't know how to use pear, you'd better learn that quick
// set the path so pear is in it
ini_set("include_path", ini_get("include_path").":/usr/share/pear");
// include PEAR and the parser
require_once("PEAR.php");
require_once("HTML/BBCodeParser.php");

// you can tweak settings from a ini fil
$config = parse_ini_file("BBCodeParser.ini", true);
$options = &PEAR::getStaticProperty("HTML_BBCodeParser", "_options");
$options = $config["HTML_BBCodeParser"];

// here start the parsing
$parser = new HTML_BBCodeParser();
$parser->setText($the_mighty_BBCode);
$parser->parse();
$parsed = $parser->getParsed();

// don't forget to clean that
echo htmlspecialchars(striptags($parsed));


标签: php regex bbcode