I have a text file that contains comma delineated values representing data set with each row within the string. There are about 2 million of them and I want to parse the string, create Laravel models from them and store each as a row in my database.
At this time, I have a class that parses the file line by line and creates a model for each as follows:
class LargeFileParser{
// File Reference
protected $file;
// Check if file exists and create File Object
public function __construct($filename, $mode="r"){
if(!file_exists($filename)){
throw new Exception("File not found");
}
$this->file = new \SplFileObject($filename, $mode);
}
// Iterate through the text or binary document
public function iterate($type = "Text", $bytes = NULL)
{
if ($type == "Text") {
return new \NoRewindIterator($this->iterateText());
} else {
return new \NoRewindIterator($this->iterateBinary($bytes));
}
}
// Handle Text iterations
protected function iterateText()
{
$count = 0;
while (!$this->file->eof()) {
yield $this->file->fgets();
$count++;
}
return $count;
}
// Handle binary iterations
protected function iterateBinary($bytes)
{
$count = 0;
while (!$this->file->eof()) {
yield $this->file->fread($bytes);
$count++;
}
}
}
I then have a controller (I want to be able to run this migration via a route occasionally) that handles creating and inserting the models into the database:
class CarrierDataController extends Controller
{
// Store the data keys for a carrier model
protected $keys;
//Update the Carrier database with the census info
public function updateData(){
// File reference
$file = new LargeFileParser('../storage/app/CENSUS.txt');
//Get iterator for the file
$iterator = $file->iterate("Text");
// For each iterator, store the data object as a carrier in the database
foreach ($iterator as $index => $line) {
// First line sets the keys specified in the file
if($index == 0){
$this->keys = str_getcsv(strtolower($line), ",", '"');
}
// The rest hold the data for each model
else{
if ($index <= 100) {
// Parse the data to an array
$dataArray = str_getcsv($line, ",", '"');
// Get a data model
$dataModel = $this->createCarrierModel(array_combine($this->keys, $dataArray));
// Store the data
$this->storeData($dataModel);
}
else{
break;
}
}
}
}
// Return a model for the data
protected function createCarrierModel($dataArray){
$carrier = Carrier::firstOrNew($dataArray);
return $carrier;
}
// Store the carrier data in the database
protected function storeData($data){
$data->save();
}
}
This works perfectly...that is while I'm limiting the function to 100 inserts. If I remove this check and allow it to run this function over the entire 2 million data sets, it no longer works. Either there is a timeout, or if I remove the timeout via something like ini_set('max_execution_time', 6000);
I eventually get a "failed to respond" message from the browser.
My assumption is that there needs to be some sort of chunking in place, but I'm honestly not sure of the best approach for handling this volume.
Thank you in advance for any suggestions you may have.