Regex & BBCode - Perfecting Nested Quote

2020-02-29 10:50发布

I'm working on some BBcode for my website.

I've managed to get most of the codes working perfectly, however the [QUOTE] tag is giving me some grief.

When I get something like this:

[QUOTE=1]
[QUOTE=2]
This is a quote from someone else
[/QUOTE]
This is someone else quoting someone else
[/QUOTE]

It will return:

> 1 said:  [QUOTE=2]This is a quote from
> someone else

This is someone else quoting someone else[/QUOTE]

So what is happening is the [/quote] from the nested quote is closing the quote block.

The Regex I am using is:

"[quote=(.*?)\](.*?)\[/quote\]'is"

How can I make it so nested Quotes will appear properly?

Thank you.

标签: php regex bbcode
2条回答
祖国的老花朵
2楼-- · 2020-02-29 11:12

You could construct recursive regular expression (available since libpcre-3.0 according to their changelog):

\[quote=(.*?)\](((?R)|.)*?)\[\/quote\]

But it would be better if you follow @codeka advice.

Update: (?R) here means «insert the whole regular expression in place where (?R) occurs». So a(?R)?b is equivalent (if you forget about capturing groups) to a(a(?-1)?b)?b which is equivalent to a(a(a(?-1)?b)?b)?b and so on. Instead of (?R) you can use (?N), (?+N), (?-N) and (?&a) which means «substitute with N'th capturing group», «substitute with N'th next capturing group», «substitute with N'th previous capturing group» and «substitute with capturing group named «a»».

查看更多
在下西门庆
3楼-- · 2020-02-29 11:18

This is not really a task that regular expressions are good for. It's almost like trying to parse HTML with regular expressions, and we know what happens when you do that...

What you could do, and even then I don't think it's all that great an idea, is to use preg_split to split your input text into tags-and-non-tags. So you'll end up with a list like this:

  • [QUOTE=1]
  • (blank)
  • [QUOTE=1]
  • This is a quote from someone else
  • [/QUOTE]
  • This is someone else quoting someone else
  • [/QUOTE]

Then you run through the list converting the tags to HTML and outputting the plain-text unmodified. You can even get fancy and keep "nesting" counts so that if you encounter a "[/quote]" when you're not expecting it, you can handle the situation a bit better than just outputting invalid HTML. Alternatively, you just output things as you find them and let HTMLPurify or something clean it up later.

查看更多
登录 后发表回答