Removing string inside brackets

2020-04-27 14:49发布

问题:

Good day!

I would like some help in removing strings inside the square brackets and including the square brackets.

The string looks like this:

$string = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";

I just would like to remove the brackets and its contents that contain "www.example.com". I would like to retain "[test]" in the string and any other brackets have no "www.example.com" in them.

Thanks!

回答1:

Note: The OP has dramatically changed the question. This solution was designed to handle the question in its original (more difficult) form (before the "www.example.com" constraint was added.) Although the following solution has been modified to handle this additional constraint, a simpler solution would now probably suffice (i.e. anubhava's answer).

Here is my tested solution:

function strip_bracketed_special($text) {
    $re = '% # Remove bracketed text having "www.example.com" within markup.
          # Skip comments, CDATA, SCRIPT & STYLE elements, and HTML tags.
          (                      # $1: HTML stuff to be left alone.
            <!--.*?-->           # HTML comments (non-SGML compliant).
          | <!\[CDATA\[.*?\]\]>  # CDATA sections
          | <script.*?</script>  # SCRIPT elements.
          | <style.*?</style>    # STYLE elements.
          | <\w+                 # HTML element start tags.
            (?:                  # Group optional attributes.
              \s+                # Attributes separated by whitespace.
              [\w:.-]+           # Attribute name is required
              (?:                # Group for optional attribute value.
                \s*=\s*          # Name and value separated by "="
                (?:              # Group for value alternatives.
                  "[^"]*"        # Either double quoted string,
                | \'[^\']*\'     # or single quoted string,
                | [\w:.-]+       # or un-quoted string (limited chars).
                )                # End group of value alternatives.
              )?                 # Attribute values are optional.
            )*                   # Zero or more start tag attributes.
            \s*/?>               # End of start tag (optional self-close).
          | </\w+>               # HTML element end tags.
          )                      # End #1: HTML Stuff to be left alone.
        | # Or... Bracketed structures containing www.example.com
          \s*\[                  # (optional ws), Opening bracket.
          [^\]]*?                # Match up to required content.
          www\.example\.com      # Required bracketed content.
          [^\]]*                 # Match up to closing bracket.
          \]\s*                  # Closing bracket, (optional ws).
        %six';
    return preg_replace($re, '$1', $text);
}

Note that the regex skips removal of bracketed material from within: HTML comments, CDATA sections, SCRIPT and STYLE elements and from within HTML tag attribute values. Given the following XHTML markup (which tests these scenarios), the above function correctly removes only the bracketed contents within html element contents:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal. [Remove this www.example.com]</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal. [Remove this www.example.com]</h1>
<p>Test special removal. [Remove this www.example.com]</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal. [Remove this www.example.com]
</p>
</div>
</body>
</html>

Here is the same markup after being run through the PHP function above:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <title>Test special removal.</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css">
        .test.before {
            content: "[Do not remove www.example.com]";
        }
    </style>
    <script type="text/javascript">
        // <![CDATA[ ["Do not remove www.example.com"] ]]>
        var ob = {};
        ob["Do not remove www.example.com"] = "stuff";
        var str = "[Do not remove www.example.com]";
    </script>
</head>
<body>
<!-- <![CDATA[ ["Do not remove www.example.com"] ]]> -->
<div title="[Do not remove www.example.com]">
<h1>Test special removal.</h1>
<p>Test special removal.</p>
<p onclick='var str = "[Do not remove www.example.com]"; return false;'>
    Test special removal. [Do not remove this]
    Test special removal.</p>
</div>
</body>
</html>

This solution should work quite well for just about any valid (X)HTML you can throw at it. (But please, no funky shorttags or SGML comments!)



回答2:

$str = "Lorem ipsum dolor<br /> [ Context are found on www.example.com ] <br />some text here. Text here. [test] Lorem ipsum dolor.";
$str = preg_replace('~\[[^]]*?www\.example\.com[^]]*\]~si', "", $str);
var_dump($str);

OUTPUT

string(83) "Lorem ipsum dolor<br />  <br />some text here. Text here. [test] Lorem ipsum dolor."

PS: It will work with line broken in multiple lines.



回答3:

Use a regular expression something like /\[.*?\]/. The backslashes are necessary, otherwise it will try to match any single character ., *, or ? instead.



回答4:

The simplest method I can think of is using a regular expression to math everything between [ and ] then replace it with "". The code below will replace the string you used in the example. If the actual strings that need to be removed are more complex you can change the regular expression to match then. I recommend using regexpal.com for testing your regular expressions.

$string = preg_replace("\[[A-Za-z .]*\]","",$string);



回答5:

The below code will change <br/> to newline characters:

$str = "Lorem ipsum dolor<br />[ Context are found on www.example.com ] <br />some text here";
$str = preg_replace( "/\[[^\]]*\]/m", "", $str);
echo $str;

Output:

Lorem ipsum dolor

some text here