PHP preg_split if not inside curly brackets

2019-07-16 05:18发布

问题:

I'm makin' a scripting language interpreter using PHP. I have this code in that scripting language:

write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly

(Yes, it's hard to believe but that's the syntax)

Which regex must I use to split this (split by spaces) but only if not inside the curly brackets. So I want to turn the above code into this array:

  1. write
  2. Hello, World!
  3. in
  4. either
  5. the
  6. color
  7. blue
  8. or
  9. red
  10. or
  11. #00AA00
  12. and
  13. in
  14. either
  15. the
  16. font
  17. Arial Black
  18. or
  19. Monaco
  20. where
  21. both
  22. the
  23. color
  24. and
  25. font
  26. are
  27. determined
  28. randomly

(The strings inside the curly brackets are show above in bold) The strings inside the curly brackets must be one element each. So {Hello, World!} cannot be: 1. Hello, 2. World!

How can I do this?

Thanks in advance.

回答1:

what about using something like this :

$str = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';

$matches = array();
preg_match_all('#\{.*?\}|[^ ]+#', $str, $matches);

var_dump($matches[0]);

Which will get you :

array
  0 => string 'write' (length=5)
  1 => string '{Hello, World!}' (length=15)
  2 => string 'in' (length=2)
  3 => string 'either' (length=6)
  4 => string 'the' (length=3)
  5 => string 'color' (length=5)
  6 => string '{blue}' (length=6)
  7 => string 'or' (length=2)
  8 => string '{red}' (length=5)
  9 => string 'or' (length=2)
  10 => string '{#00AA00}' (length=9)
  11 => string 'and' (length=3)
  12 => string 'in' (length=2)
  13 => string 'either' (length=6)
  14 => string 'the' (length=3)
  15 => string 'font' (length=4)
  16 => string '{Arial Black}' (length=13)
  17 => string 'or' (length=2)
  18 => string '{Monaco}' (length=8)
  19 => string 'where' (length=5)
  20 => string 'both' (length=4)
  21 => string 'the' (length=3)
  22 => string 'color' (length=5)
  23 => string 'and' (length=3)
  24 => string 'the' (length=3)
  25 => string 'font' (length=4)
  26 => string 'are' (length=3)
  27 => string 'determined' (length=10)
  28 => string 'randomly' (length=8)

The, you just have to iterate over those results ; the ones starting by { and ending by } will be your "important" words, and the others will be the rest.


Edit after the comment : one way to identify the important words would be something like this :

foreach ($matches[0] as $word) {
    $m = array();
    if (preg_match('#^\{(.*)\}$#', $word, $m)) {
        echo '<strong>' . htmlspecialchars($m[1]) . '</strong>';
    } else {
        echo htmlspecialchars($word);
    }
    echo '<br />';
}

Or, like you said, working with strpos and strlen would work too ;-)



回答2:

Does the order matter? If not you could extract all {}'s, remove them, then operate on the leftover string.



回答3:

I would replace them using preg_replace_callback. With the callback you can keep track of the order and replace them with something like %var1%, %var2%, etc.

I don't think that there is a way to explode by spaces, but not in the curly brackets without modifying the string beforehand.



回答4:

This could be done iterately without regexp. You iterate over the entire string. You put every character in a temporary variable, unless you find a space. When you find a space, you put the content of the temporary variable in the array, empty it, and then continue.

If you find a bracket, you set a boolean, and then put everything in the temp var, until you find a closing bracket. And so on.

<?php
$string = "write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly";
$bracket = false;
$words = array();
$temp = "";

for($i = 0; $i < strlen($string); $i++){    
    $char = $string[$i]
    if($bracket){
        $temp .= $char;
        if($char == "}"){
            $bracket = false;
            $words[] = $temp;
        }
    }
    else{
        if($char == " "){
            if($temp != ""){
                $words[] = $temp;
                $temp = "";
            }
        }
        elseif($char == "{"}{
            $temp .= $char;
            $bracket = true;
        }
        else{
            $temp .= $char;
        }
    }
}
?>

Code is untested.



回答5:

You want to split on all spaces that are not contained within curly braces.

Match the curly expressions or a sequence of non-whitespace characters then disregard these matches with \K then use the following space as the delimiter.

Code: (Demo)

$text = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';

var_export(preg_split('~({[^}]*}|\S+)\K ~', $text));

p.s. You can replace curly braces with strong tags like this: https://3v4l.org/fXrgE

p.p.s. You could build your exact ordered list with preg_replace_callback(): (Demo) <-- transfer to phptester.net to see it rendered

$text = 'write {Hello, World!} in either the color {blue} or {red} or {#00AA00} and in either the font {Arial Black} or {Monaco} where both the color and the font are determined randomly';

echo "<ol>" , preg_replace_callback('~{([^}]*)}|(\S+)~', function($m) {
        if (!isset($m[2])) {
            return "<li><strong>{$m[1]}</strong></li>\n";
        }
        return "<li>{$m[2]}</li>\n";
    },
    $text) , "<ol>";