I'm trying to parse strings that represent source code, something like this:
[code lang="html"]
<div>stuff</div>
[/code]
<div>stuff</div>
As you can see from my previous 20 questions, I tried to do it with PHP's regex functions, but ran into many problems, especially when the string is very big...
Do you guys know a BB parser class written in PHP that I can use for this, instead of regexes?
What I need it to do is:
- be able to convert all content from within
[code]
tags with html entities - be able to run some kind of a filter (a callback function of mine) only on content outside of the
[code]
tags
thank you
edit: I ended up using this:
convert all
<pre>
and<code>
tags to [pre] and [code]:str_replace(array('<pre>', '</pre>', '<code>', '</code>'), array('[pre]', '[/pre]', '[code]', '[/code]'), $content);
get contents from between [code]..[/code] and [pre]...[/pre] and do the html entity conversion
preg_replace_callback('/(.?)\[(pre|code)\b(.*?)(?:(\/))?\](?:(.+?)\[\/\2\])?(.?)/s', 'self::specialchars', $content);
(i stole this pattern from wordpress shortcode functions :)
store the entity converted content in a temporary array variable, and replace the one from
$content
with a unique IDI can now safely run my filter on
$content
, because there's no code in it, just the ID (this filter does a strip_tags on the entire text and converts stuff likehttp://blabla.com
to links)replace the unique IDs from
$content
with the converted code blocks from the array variable
do you think it's ok?