regex: If string contains specific word within bra

2020-06-06 20:38发布

问题:

Using regex, I want to detect if a specific word exists within brackets in a string, if it does, remove the bracket and it's content.

The words I want to target are:

picture
see
lorem

So, here are 3 string examples:

$text1 = 'Hello world (see below).';
$text2 = 'Lorem ipsum (there is a picture here) world!';
$text3 = 'Attack on titan (is lorem) great but (should not be removed).';

What regex can I use with preg_replace():

$text = preg_replace($regex, '' , $text);

To remove these brackets and their content if they contain those words?

Result should be:

$text1 = 'Hello world.';
$text2 = 'Lorem ipsum world!';
$text3 = 'Attack on titan great but (should not be removed).';

Here's an ideone for testing.

回答1:

You could use the following approach (thanks to @Casimir for pointing out an error before!):

<?php
$regex = '~
            (\h*\(                             # capture groups, open brackets
                [^)]*?                         # match everything BUT a closing bracket lazily
                (?i:picture|see|lorem)         # up to one of the words, case insensitive
                [^)]*?                         # same construct as above
            \))                                # up to a closing bracket
            ~x';                               # verbose modifier

$text = array();
$text[] = 'Hello world (see below).';
$text[] = 'Lorem ipsum (there is a picture here) world!';
$text[] = 'Attack on titan (is lorem) great but (should not be removed).';

for ($i=0;$i<count($text);$i++)
    $text[$i] = preg_replace($regex, '', $text[$i]);

print_r($text);
?>

See a demo on ideone.com and on regex101.com.



回答2:

You can use this regex for searching:

\h*\([^)]*\b(?:picture|see|lorem)\b[^)]*\)

Which means

\h*                    # match 0 or more horizontal spaces
\(                     # match left (
[^)]*                  # match 0 or more of any char that is not )
\b                     # match a word boundary
(?:picture|see|lorem)  # match any of the 3 keywords
\b                     # match a word boundary
[^)]*                  # match 0 or more of any char that is not )
\)                     # match right )

and replace by empty string:

RegEx Demo

Code:

$re = '/\h*\([^)]*\b(?:picture|see|lorem)\b[^)]*\)/'; 

$result = preg_replace($re, '', $input);