Why is array_merge_recursive not recursive?

2019-07-12 08:20发布

I recently found a bug in my application caused by unexpected behaviour of array_merge_recursive. Let's take a look at this simple example:

$array1 = [
    1 => [
        1 => 100,
        2 => 200,
    ],
    2 => [
        3 => 1000,
    ],
    3 => [
        1 => 500
    ]
];
$array2 = [
    3 => [
        1 => 500
    ]
];
array_merge_recursive($array1, $array2);
//returns: array:4 [ 0 => //...

I expected to get an array with 3 elements: keys 1, 2, and 3. However, the function returns an array with keys 0, 1, 2 and 3. So 4 elements, while I expected only 3. When I replace the numbers by their alphabetical equivalents (a, b, c) it returns an array with only 3 elements: a, b and c.

$array1 = [
    'a' => [
        1 => 100,
        2 => 200,
    ],
    'b' => [
        3 => 1000,
    ],
    'c' => [
        1 => 500
    ]
];
$array2 = [
    'c' => [
        1 => 500
    ]
];
array_merge_recursive($array1, $array2);
//returns: array:3 [ 'a' => //...

This is (to me at least) unexpected behaviour, but at least it's documented:

http://php.net/manual/en/function.array-merge-recursive.php

If the input arrays have the same string keys, then the values for these keys are merged together into an array, and this is done recursively, so that if one of the values is an array itself, the function will merge it with a corresponding entry in another array too. If, however, the arrays have the same numeric key, the later value will not overwrite the original value, but will be appended.

The documentation isn't very clear about what 'appended' means. It turns out that elements of $array1 with a numeric key will be treated as indexed elements, so they'll lose there current key: the returned array starts with 0. This will lead to strange outcome when using both numeric and string keys in an array, but let's not blame PHP if you're using a bad practice like that. In my case, the problem was solved by using array_replace_recursive instead, which did the expected trick. ('replace' in that function means replace if exist, append otherwise; naming functions is hard!)

Question 1: recursive or not?

But that's not were this question ends. I thought array_*_resursive would be a recursive function:

Recursion is a kind of function call in which a function calls itself. Such functions are also called recursive functions. Structural recursion is a method of problem solving where the solution to a problem depends on solutions to smaller instances of the same problem.

It turns out it isn't. While $array1 and $array2 are associative arrays, both $array1['c'] and $array2['c'] from the example above are indexed arrays with one element: [1 => 500]. Let's merge them:

array_merge_recursive($array1['c'], $array2['c']);
//output: array:2 [0 => 500, 1 => 500]

This is expected output, because both arrays have a numeric key (1), so the second will be appended to the first. The new array starts with key 0. But let's get back to the very first example:

array_merge_recursive($array1, $array2);
// output:
// array:3 [
//  "a" => array:2 [
//    1 => 100
//    2 => 200
//  ]
//  "b" => array:1 [
//    3 => 1000
//  ]
//  "c" => array:2 [
//    1 => 500 //<-- why not 0 => 500?
//    2 => 500
//  ]
//]

$array2['c'][1] is appended to $array1['c'] but it has keys 1 and 2. Not 0 and 1 in the previous example. The main array and it's sub-arrays are treated differently when handling integer keys.

Question 2: String or integer key makes a big difference.

While writing this question, I found something else. It's getting more confusing when replacing the numeric key with a string key in a sub-array:

$array1 = [
    'c' => [
        'a' => 500
    ]
];
$array2 = [
    'c' => [
        'a' => 500
    ]
];
array_merge_recursive($array1, $array2);
// output:
// array:1 [
//  "c" => array:1 [
//    "a" => array:2 [
//      0 => 500
//      1 => 500
//    ]
//  ]
//]

So using a string key will cast (int) 500 into array(500), while using a integer key won't.

Can someone explain this behaviour?

2条回答
闹够了就滚
2楼-- · 2019-07-12 08:51

If we take a step back and observe how array_merge*() functions behave with only one array then we get a glimpse into how it treats associative and indexed arrays differently:

$array1 = [
    'k' => [
        1 => 100,
        2 => 200,
    ],
    2 => [
        3 => 1000,
    ],
    'f' => 'gf',
    3 => [
        1 => 500
    ],
    '99' => 'hi',
    5 => 'g'
];

var_dump( array_merge_recursive( $array1 ) );

Output:

array(6) {
  ["k"]=>
  array(2) {
    [1]=>
    int(100)
    [2]=>
    int(200)
  }
  [0]=>
  array(1) {
    [3]=>
    int(1000)
  }
  ["f"]=>
  string(2) "gf"
  [1]=>
  array(1) {
    [1]=>
    int(500)
  }
  [2]=>
  string(2) "hi"
  [3]=>
  string(1) "g"
}

As you can see, it took all numeric keys and ignored their actual value and gave them back to you in the sequence in which they were encountered. I would imagine that the function does this on purpose to maintain sanity (or efficiency) within the underlying C code.

Back to your two array example, it took the values of $array1, ordered them, and then appended $array2.

Whether or not this behavior is sane is a totally separate discussion...

查看更多
爷的心禁止访问
3楼-- · 2019-07-12 08:57

You should read the link you provided it states (emphasis mine):

If the input arrays have the same string keys, then the values for these keys are merged together into an array, and this is done recursively, so that if one of the values is an array itself, the function will merge it with a corresponding entry in another array too. If, however, the arrays have the same numeric key, the later value will not overwrite the original value, but will be appended.

查看更多
登录 后发表回答