Introduction - This question has been updated the 27th May 2018:
I have 1 PHP multidimensional-array, containing 6 sub-arrays, each containing 20 sub-sub-arrays, which in turn, each contain 2 sub-sub-arrays, one being a string (header), the other being an unspecified number of keywords (keywords).
I am looking to compare each of the 120 sub-sub-arrays to the 100 other sub-sub-arrays contained in the remainint 5 sub-arrays. So that sub-sub-array1 in sub-array1 is compared to sub-array1 to and including sub-array20 in sub-array2 to and including sub-array6, and so forth.
If enough keywords in two sub-sub-arrays are deemed identical and headers are as well, both using Levenshtein distance, the sub-sub-arrays will be merged.
Example script
I have written a script doing exactly this, but for two separate arrays to demonstrate my goal:
<?php
// Variable deciding maximum Levenshtein distance between two words. Can be changed to lower / increase threshhold for whether two keywords are deemed identical.
$lev_point_value = 3;
// Variable deciding minimum amount of identical (passed the $lev_point_value variable) keywords needed to merge arrays. Can be changed to lower / increase threshhold for how many keywords two arrays must have in common to be merged.
$merge_tag_value = 4;
// Variable deciding minimum Levenshtein distance between two headers needed to merge arrays. Can be changed to lower / increase threshhold for whether two titles are deemed identical.
$merge_head_value = 22;
// Array1 - A story about a monkey, includes at header and keywords.
$array1 = array (
"header" => "This is a story about a monkey.",
'keywords' => array( "Trees", "Monkey", "Flying", "Drink", "Vacation", "Coconut", "Big", "Bonobo", "Climbing"
));
// Array1 - Another, but slightly different story about a monkey, includes at header and keywords.
$array2 = array (
"header" => "This is another, but different story, about a monkey.",
'keywords' => array( "Monkey", "Big", "Trees", "Bonobo", "Fun", "Dance", "Cow", "Coconuts"
));
// Function comparing keywords between two arrays. Uses levenshtein distance lesser than $lev_point_value. Each pass increases $merged_tag, which is then returned.
function sim_tag_index($array1, $array2, $lev_point_value) {
$merged_tag = 0;
foreach ($array1['keywords'] as $item1){
foreach ($array2["keywords"] as $item2){
if (levenshtein($item1, $item2) <= $lev_point_value) {
$merged_tag++;
};
}
};
return $merged_tag;
}
// Function comparing headers between two arrays using levenshtein distance, which is then returned as $merged_head.
function sim_header_index($array1, $array2) {
$merged_head = (levenshtein($array1['header'], $array2['header']));
return $merged_head;
}
// Function running sim_tag_index against $merge_tag_value, if it passes, then running sim_tag_index against $merge_head_value, if this passes aswell, merge arrays.
function merge_on_sim($array1, $array2, $merge_tag_value, $merge_head_value, $lev_point_value) {
$group = array();
if (sim_tag_index($array1, $array2, $lev_point_value) >= $merge_tag_value) {
if (sim_header_index($array1, $array2) >= $merge_head_value) {
$group = (array_unique(array_merge($array1["keywords"],$array2["keywords"])));
}
}
return $group;
}
// Printing function merge_on_sim.
print_r (merge_on_sim($array1, $array2, $merge_tag_value, $merge_head_value, $lev_point_value));
?>
Question:
How can I expand or rewrite my script to go through multiple sub-sub-arrays, comparing them to all other sub-sub-arrays, found in other sub-arrays, and then merge sub-sub-arrays that are deemed identical enough?
Multidimensional Array Structure
$array = array (
// Sub-array 1
array (
// Story 'Monkey 1' - Has identical sub-sub-arrays 'Monkey 2' and 'Monkey 3' and will be merged with them.
array (
"header" => "This is a story about a monkey.",
'keywords' => array( "Trees", "Monkey", "Flying", "Drink", "Vacation", "Coconut", "Big", "Bonobo", "Climbing")
),
// Story 'Cat 1' - Has identical sub-sub-array 'Cat 2' and will be merged with it.
array (
"header" => "Here's a catarific story about a cat",
'keywords' => array( "meauw", "raaaw", "kitty", "growup", "Fun", "claws", "fish", "salmon")
)
),
// Sub-array 2
array (
// Story 'Monkey 2' - Has identical sub-sub-arrays 'Monkey 1' and 'Monkey 3' and will be merged with them.
array (
"header" => "This is another, but different story, about a monkey.",
'keywords' => array( "Monkey", "Big", "Trees", "Bonobo", "Fun", "Dance", "Cow", "Coconuts")
),
// Story 'Cat 2' - Has identical sub-sub-array 'Cat 1' and will be merged with it.
array (
"header" => "Here's a different story about a cat",
'keywords' => array( "meauwe", "ball", "cat", "kitten", "claws", "sleep", "fish", "purr")
)
),
// Sub-array 3
array (
// Story 'Monkey 3' - Has identical sub-sub-arrays 'Monkey 1' and 'Monkey 2' and will be merged with them.
array (
"header" => "This is a third story about a monkey.",
'keywords' => array( "Jungle", "tree", "monkey", "Bonobo", "Fun", "Dance", "climbing", "Coconut", "pretty")
),
// Story 'Fireman 1' - Has no identical sub-sub-arrays and will not be merged.
array (
"header" => "This is a story about a fireman",
'keywords' => array( "fire", "explosion", "burning", "rescue", "happy", "help", "water", "car")
)
)
);
Wanted Multidimensional Array
$array = array (
// Story 'Monkey 1', 'Monkey 2' and 'Monkey 3' merged.
array (
"header" => array( "This is a story about a monkey.", "This is another, but different story, about a monkey.", "This is a third story about a monkey."),
'keywords' => array( "Trees", "Monkey", "Flying", "Drink", "Vacation", "Coconut", "Big", "Bonobo", "Climbing", "Fun", "Dance", "Cow", "Coconuts", "Jungle", "tree", "pretty")
),
// Story 'Cat 1' and 'Cat 2' merged.
array (
"header" => array( "Here's a catarific story about a cat", "Here's a different story about a cat"),
'keywords' => array( "meauw", "raaaw", "kitty", "growup", "Fun", "claws", "fish", "salmon", "ball", "cat", "kitten", "sleep", "fish", "purr")
)
);