BB-Code-RegEx in javascript

I have this piece of code:

var s_1 = 'blabla [size=42]the answer[/size] bla bla blupblub';
var s_2 = 'blabla [size=42]the answer[/size] bla bla blupblub [size=32] 32 [/size]';

alert('Test-String:\n' + s_1 + '\n\nReplaced:\n' + size(s_1));
alert('Test-String:\n' + s_2 + '\n\nReplaced:\n' + size(s_2));


function size(s) {
    var reg = /\[size=(\d{1,2})\]([\u0000-\uFFFF]+)\[\/size\]/gi;
    s = s.replace(reg, function(match, p1, p2) {
        return '<span style="font-size: ' + ((parseInt(p1) > 48) ? '48' : p1) + 'px;">' + p2 + '</span>';
    })
    return s;    
}

It's supposed to replace all occurrences of the "[size=nn][/size]"-Tags but it only replaces the outer ones. I can't figure out how to replace all of them. (Please don't recommend to use a PHP-Script, I'd like to have a live-preview for the BB-Code formated Text)

Test it

回答1:

Matching (possibly nested) BBCode tags

A multi-pass approach is required if the elements are nested. This can be accomplished in one of two ways; matching from the inside out (no recursive expression required), or from the outside in (which requires a recursive expression). (See also my answer to a similar question: PHP, nested templates in preg_replace) However, since the Javascript regex engine does not support recursive expressions, the only way to (correctly) do this using regex is from the inside out. Below is a tested function which replaces BBCode SIZE tags with SPAN html tags from the inside out. Note that the (fast) regex below is complex (for one thing it implements Jeffrey Friedl's "unrolling-the-loop" efficiency technique - See: Mastering Regular Expressions (3rd Edition) for details), and IMO all complex regexes should be thoroughly commented and formatted for readability. Since Javascript has no free-spacing mode, the regex below is first presented fully commented in PHP free-spacing mode. The uncommented js regex actually used is identical to the verbose commented one.

Regex to match innermost of (possibly nested) SIZE tags:

// Regular expression in commented (PHP string) format.
$re = '% # Match innermost [size=ddd]...[/size] structure.
    \[size=            # Literal start tag name, =.
    (\d+)\]            # $1: Size number, ending-"]".
    (                  # $2: Element contents.
      # Use Friedls "Unrolling-the-Loop" technique:
      #   Begin: {normal* (special normal*)*} construct.
      [^[]*            # {normal*} Zero or more non-"[".
      (?:              # Begin {(special normal*)*}.
        \[             # {special} Tag open literal char,
        (?!            # but only if NOT start of
          size=\d+\]   # [size=ddd] open tag
        | \/size\]     # or [/size] close tag.
        )              # End negative lookahead.
        [^[]*          # More {normal*}.
      )*               # Finish {(special normal*)*}.
    )                  # $2: Element contents.
    \[\/size\]         # Literal end tag.
    %ix';

Javascript function: `parseSizeBBCode(text)`

function parseSizeBBCode(text) {
    // Here is the same regular expression in javascript syntax:
    var re = /\[size=(\d+)\]([^[]*(?:\[(?!size=\d+\]|\/size\])[^[]*)*)\[\/size\]/ig;
    while(text.search(re) !== -1) {
        text = text.replace(re, '<span style="font-size: $1pt">$2</span>');
    }
    return text;
}

Example input:

r'''
[size=10] size 10 stuff
    [size=20] size 20 stuff
        [size=30] size 30 stuff [/size]
    [/size]
[/size]
'''

Example output:

r'''
<span style="font-size: 10pt"> size 10 stuff
    <span style="font-size: 20pt"> size 20 stuff
        <span style="font-size: 30pt"> size 30 stuff </span>
    </span>
</span>
'''

Disclaimer - Don't use this solution!

Note that using regex to parse BBCode is fraught with peril! (There are a lot of "gotchas" not mentioned here.) Many would say that it is impossible. However, I would strongly disagree and have in fact written a complete BBCode parser (in PHP) which uses recursive regular expressions and works quite nicely (and is fast). You can see it in action here: New 2011 FluxBB Parser (Note that it uses some very complex regexes not for the faint of heart).

But in general, I would strongly warn against parsing BBCode using regex unless you have a very deep and thorough understanding of regular expressions (which can be gained from careful study and practice of Friedl's masterpiece). In other words, if you are not a master of regex (i.e. a regex guru), steer clear from using them for any but the most trivial of applications.

回答2:

var reg = /\[size=(\d{1,2})\]([\u0000-\uFFFF]+?)\[\/size\]/gi;
                                              ↑
                                    make it lazy (non-greedy)

which is same as

var reg = /\[size=(\d{1,2})\](.+?)\[\/size\]/gi;

回答3:

You need to use a non-greedy expression, qualified by ?, after the +.

Demo

回答4:

I'd try to use ([\u0000-\uFFFF]+?) ~~(untested)~~, this tells to stop at the first occurrence of [/size] instead of going straight to the last one

EDIT:

Yup, tested, seems ok