Parsing command arguments in PHP

2019-01-19 10:57发布

Is there a native "PHP way" to parse command arguments from a string? For example, given the following string:

foo "bar \"baz\"" '\'quux\''

I'd like to create the following array:

array(3) {
  [0] =>
  string(3) "foo"
  [1] =>
  string(7) "bar "baz""
  [2] =>
  string(6) "'quux'"
}

I've already tried to leverage token_get_all(), but PHP's variable interpolation syntax (e.g. "foo ${bar} baz") pretty much rained on my parade.

I know full well that I could write my own parser. Command argument syntax is super simplistic, but if there's an existing native way to do it, I'd much prefer that over rolling my own.

EDIT: Please note that I am looking to parse the arguments from a string, NOT from the shell/command-line.


EDIT #2: Below is a more comprehensive example of the expected input -> output for arguments:

foo -> foo
"foo" -> foo
'foo' -> foo
"foo'foo" -> foo'foo
'foo"foo' -> foo"foo
"foo\"foo" -> foo"foo
'foo\'foo' -> foo'foo
"foo\foo" -> foo\foo
"foo\\foo" -> foo\foo
"foo foo" -> foo foo
'foo foo' -> foo foo

11条回答
不美不萌又怎样
2楼-- · 2019-01-19 11:22

I suggest something like:

$str = <<<EOD
foo "bar \"baz\"" '\'quux\''
EOD;

$match = preg_split("/('(?:.*)(?<!\\\\)(?>\\\\\\\\)*'|\"(?:.*)(?<!\\\\)(?>\\\\\\\\)*\")/U", $str, null, PREG_SPLIT_DELIM_CAPTURE);

var_dump(array_filter(array_map('trim', $match)));

With some assistance from: string to array, split by single and double quotes for the regexp

You still have to unescape the strings in the array after.

array(3) {
  [0]=>
  string(3) "foo"
  [1]=>
  string(13) ""bar \"baz\"""
  [3]=>
  string(10) "'\'quux\''"
}

But you get the picture.

查看更多
冷血范
3楼-- · 2019-01-19 11:27

Since you request a native way to do this, and PHP doesn't provide any function that would map $argv creation, you could workaround this lack like this :

Create an executable PHP script foo.php :

<?php

// Skip this file name
array_shift( $argv );

// output an valid PHP code
echo 'return '. var_export( $argv, 1 ).';';

?>

And use it to retrieve arguments, the way PHP will actually do if you exec $command :

function parseCommand( $command )
{
    return eval(
        shell_exec( "php foo.php ".$command )
    );
}


$command = <<<CMD
foo "bar \"baz\"" '\'quux\''
CMD;


$args = parseCommand( $command );

var_dump( $args );

Advantages :

  • Very simple code
  • Should be faster than any regular expression
  • 100% close to PHP behavior

Drawbacks :

  • Requires execution privilege on the host
  • Shell exec + eval on the same $var, let's party ! You have to trust input or to do so much filtering that simple regexp may be be faster (I dindn't dig deep into that).
查看更多
不美不萌又怎样
4楼-- · 2019-01-19 11:27

There really is no native function for parsing commands to my knowledge. However, I have created a function which does the trick natively in PHP. By using str_replace several times, you are able to convert the string into something array convertible. I don't know how fast you consider fast, but when running the query 400 times, the slowest query was under 34 microseconds.

function get_array_from_commands($string) {
    /*
    **  Turns a command string into a field
    **  of arrays through multiple lines of 
    **  str_replace, until we have a single
    **  string to split using explode().
    **  Returns an array.
    */

    // replace single quotes with their related
    // ASCII escape character
    $string = str_replace("\'","&#x27;",$string);
    // Do the same with double quotes
    $string = str_replace("\\\"","&quot;",$string);
    // Now turn all remaining single quotes into double quotes
    $string = str_replace("'","\"",$string);
    // Turn " " into " so we don't replace it too many times
    $string = str_replace("\" \"","\"",$string);
    // Turn the remaining double quotes into @@@ or some other value
    $string = str_replace("\"","@@@",$string);
    // Explode by @@@ or value listed above
    $string = explode("@@@",$string);
    return $string;
}
查看更多
做自己的国王
5楼-- · 2019-01-19 11:32

If you want to follow the rules of such parsing that are there as well as in shell, there are some edge-cases which I think aren't easy to cover with regular expressions and therefore you might want to write a method that does this (example):

$string = 'foo "bar \"baz\"" \'\\\'quux\\\'\'';
echo $string, "\n";
print_r(StringUtil::separate_quoted($string));

Output:

foo "bar \"baz\"" '\'quux\''
Array
(
    [0] => foo
    [1] => bar "baz"
    [2] => 'quux'
)

I guess this pretty much matches what you're looking for. The function used in the example can be configured for the escape character as well as for the quotes, you can even use parenthesis like [ ] to form a "quote" if you like.

To allow other than native bytesafe-strings with one character per byte you can pass an array instead of a string. the array needs to contain one character per value as a binary safe string. e.g. pass unicode in NFC form as UTF-8 with one code-point per array value and this should do the job for unicode.

查看更多
成全新的幸福
6楼-- · 2019-01-19 11:38

Regexes are quite powerful: (?s)(?<!\\)("|')(?:[^\\]|\\.)*?\1|\S+. So what does this expression mean ?

  • (?s) : set the s modifier to match newlines with a dot .
  • (?<!\\) : negative lookbehind, check if there is no backslash preceding the next token
  • ("|') : match a single or double quote and put it in group 1
  • (?:[^\\]|\\.)*? : match everything not \, or match \ with the immediately following (escaped) character
  • \1 : match what is matched in the first group
  • | : or
  • \S+ : match anything except whitespace one or more times.

The idea is to capture a quote and group it to remember if it's a single or a double one. The negative lookbehinds are there to make sure we don't match escaped quotes. \1 is used to match the second pair of quotes. Finally we use an alternation to match anything that's not a whitespace. This solution is handy and is almost applicable for any language/flavor that supports lookbehinds and backreferences. Of course, this solution expects that the quotes are closed. The results are found in group 0.

Let's implement it in PHP:

$string = <<<INPUT
foo "bar \"baz\"" '\'quux\''
'foo"bar' "baz'boz"
hello "regex

world\""
"escaped escape\\\\"
INPUT;

preg_match_all('#(?<!\\\\)("|\')(?:[^\\\\]|\\\\.)*?\1|\S+#s', $string, $matches);
print_r($matches[0]);

If you wonder why I used 4 backslashes. Then take a look at my previous answer.

Output

Array
(
    [0] => foo
    [1] => "bar \"baz\""
    [2] => '\'quux\''
    [3] => 'foo"bar'
    [4] => "baz'boz"
    [5] => hello
    [6] => "regex

world\""
    [7] => "escaped escape\\"
)

                                       Online regex demo                                 Online php demo


Removing the quotes

Quite simple using named groups and a simple loop:

preg_match_all('#(?<!\\\\)("|\')(?<escaped>(?:[^\\\\]|\\\\.)*?)\1|(?<unescaped>\S+)#s', $string, $matches, PREG_SET_ORDER);

$results = array();
foreach($matches as $array){
   if(!empty($array['escaped'])){
      $results[] = $array['escaped'];
   }else{
      $results[] = $array['unescaped'];
   }
}
print_r($results);

Online php demo

查看更多
迷人小祖宗
7楼-- · 2019-01-19 11:39

I would recommend going another way. There is already a "standard" way of doing command line arguments. it's called get_opts:

http://php.net/manual/en/function.getopt.php

I would suggest that you change your script to use get_opts, then anyone using your script will be passing parameters in a way that is familiar to them and kind of "industry standard" instead of having to learn your way of doing things.

查看更多
登录 后发表回答