BBcode regex for **bold text**

2020-07-11 09:33发布

I'm terrible with regex, but I've had a try and a Google (and even looked in reddit's source) and I'm still stuck so here goes:

My aim is to match the following 'codes' and replace them with the HTML tags. It's just the regex I'm stuck with.

**bold text**
_italic text_
~hyperlink~

Here's my attempts at the bold one:

^\*\*([.^\*]+)\*\*$

Why this isn't working? I'm using the preg syntax.

标签: regex
4条回答
2楼-- · 2020-07-11 10:04

Here is another regexp: \*\*((?:[^*]|\*(?!\*))*)\*\*

Example in Perl:

my %tag2re = (b => <<'RE_BOLD', i => '_([^_]*)_');
  \*\*(      # begin bold
    (?:[^*]  # non-star
    |        # or
    \*(?!\*) # single star
    )*       # zero or more times
  )\*\*      # end bold
RE_BOLD

my $text = <<BBCODE;
before  **bold and _italic_ *text
2nd line** after _just
           italic_ 
****
**tag _soup** as a result_
BBCODE

while (my ($tag, $re) = each %tag2re) {
    $text =~ s~$re~<$tag>$1</$tag>~gsx;
}
print $text;

It prints:

before  <b>bold and <i>italic</i> *text
2nd line</b> after <i>just
           italic</i> 
<b></b>
<b>tag <i>soup</b> as a result</i>

Or as html:

before  bold and italic *text
2nd line after just
           italic 

tag soup as a result

Stackoverflow's interpretation is:

before bold and italic *text 2nd line after just italic


tag soup as a result

查看更多
够拽才男人
3楼-- · 2020-07-11 10:14

use:

\*\*(.[^*]*)\*\*

explanation:

\*\*      // match two *'s
(.        // match any character
[^*]      // that is not a *
*)        // continuation of any character
\*\*      // match two *'s

in a character class "[ ]" "^" is only significant if it's the first character. so (.*) matches anything, (.[^*]*) is match anything until literal *

edit: in response to comments to match asterisk within (ie **bold *text**), you'd have to use a non greedy match:

\*\*(.*?)\*\*

character classes are more efficient non greedy matches, but it's not possible to group within a character class (see "Parentheses and Backreferences...")

查看更多
甜甜的少女心
4楼-- · 2020-07-11 10:18
\*\*(.*?)\*\*

that will work for the bold text.

just replace the ** with _ or ~ for the others

查看更多
一夜七次
5楼-- · 2020-07-11 10:25

First of all, get rid of the ^ and the $. Using those will only match a string that starts with ** and ends with **. Second, use the greedy quantifier to match as little text as possible, instead of making a character class for all characters other than asterisks.

Here's what I suggest:

\*\*(.+?)\*\*
查看更多
登录 后发表回答