Content from large number of web pages into array

2019-07-29 07:28发布

问题:

I have an array ($x) containing 748 URL:s. Now, I want to fetch a specific part from each page and put all those parts into a new array. That is, an array containing 748 pieces of text, each from a different URL defined in array $x.

Here's the code I've got so far:

foreach ($x as $row) {
    $contents = file_get_contents($row);

    $regex = '/delimiter_start(.*?)delimiter_end/s';
    preg_match_all($regex, $contents, $output);
}

If I var_dump $output I get a strange array that endlessly keeps looping content until I press stop in my browser. The array looks like this:

array(2) {
[0]=>
array(1) {
[0]=>
string(4786) "string 1. The one I want from the first page."}

[1]=>
array(1) {
[0]=>
string(4755) "string 1 again"}}

array(2) {
[0]=>
array(1) {
[0]=>
string(8223) "string 2. The one I want from the second page."}

[1]=>
array(1) {
[0]=>
string(8192) "string 2 again"}}

EDIT: I can actually retrieve the results I'm looking for with $output[0]. But how do I create a new array with the same contents as $output[0] that is accessible outside the loop?

回答1:

The output you are seeing from preg_match_all is standard, this is because you receive the matches and the full matched content in the output array.

$lines = Array();
foreach ($x as $row) {
$contents = file_get_contents($row);

$regex = '/delimiter_start(.*?)delimiter_end/s';
preg_match_all($regex, $contents, $output);
    if (is_array($output) && isset($output[0]) && !empty($output[0])){
    $lines[] = $output[0];
}
}
var_dump($lines);