Which to use? file_get_contents, file_get_html, or

2019-08-29 12:36发布

问题:

I need to scrape data from a table on a web page. I'd then like to store this data in an array, so that I can later store it in a database. I'm very unfamiliar with this functionality, so I'd like to use the most simple method possible.

Which should I use? file_get_contents, file_get_html, cURL?

回答1:

  1. You can use curl() or file_get_contents() to get the contents of the page.
  2. then, using the regular expressions to extract the content you need (preg_match())
  3. finally ,insert the content to database.

You can using the crontab command (Linux: crontab -e) to make the php script execute automatically.

My English is poor, so I hope anyone give me opinion. Thanks!



回答2:

I prefer PHP Simple HTML Dom Parser:

http://simplehtmldom.sourceforge.net/

You can then loop through certain elements with their syntax. For example, to get the names of all the teams on the link you sent over, save it to an array and then do a MySQL insert statement, you'd do something like this:

$html = file_get_html('http://www.tablesleague.com/england/');

$name_array = array();

// Get all names
foreach($html->find('div.cell.name.no_border') as $element){ 
    //Push the name to an array
    array_push($name_array, $element->innertext);
}

Then prepare a MySQL statement:

foreach($name_array as $name){
    $sql = "INSERT INTO table_name (name) VALUES ($name)";
    $result = $mysqli->query($sql);
}

You could always create a multidimensional array with all the elements you'd like, pull them from the array when you loop through it and upload multiple items for every query.