Laravel insert millions of database rows from mode

2019-03-03 22:35发布

问题:

I have a text file that contains comma delineated values representing data set with each row within the string. There are about 2 million of them and I want to parse the string, create Laravel models from them and store each as a row in my database.

At this time, I have a class that parses the file line by line and creates a model for each as follows:

class LargeFileParser{

    // File Reference
    protected $file;

    // Check if file exists and create File Object
    public function __construct($filename, $mode="r"){
        if(!file_exists($filename)){
            throw new Exception("File not found");
        }

        $this->file = new \SplFileObject($filename, $mode);
    }

    // Iterate through the text or binary document
    public function iterate($type = "Text", $bytes = NULL)
    {
        if ($type == "Text") {

            return new \NoRewindIterator($this->iterateText());

        } else {

            return new \NoRewindIterator($this->iterateBinary($bytes));
        }

    }

    // Handle Text iterations
    protected function iterateText()
    {
        $count = 0;

        while (!$this->file->eof()) {

            yield $this->file->fgets();

            $count++;
        }
        return $count;
    }

    // Handle binary iterations
    protected function iterateBinary($bytes)
    {
        $count = 0;

        while (!$this->file->eof()) {

            yield $this->file->fread($bytes);

            $count++;
        }
    }
}

I then have a controller (I want to be able to run this migration via a route occasionally) that handles creating and inserting the models into the database:

class CarrierDataController extends Controller
{
    // Store the data keys for a carrier model
    protected $keys;

    //Update the Carrier database with the census info
    public function updateData(){
        // File reference
        $file = new LargeFileParser('../storage/app/CENSUS.txt');

        //Get iterator for the file 
        $iterator = $file->iterate("Text");

        // For each iterator, store the data object as a carrier in the database
        foreach ($iterator as $index => $line) {   
            // First line sets the keys specified in the file 
            if($index == 0){
                $this->keys = str_getcsv(strtolower($line), ",", '"');
            }
            // The rest hold the data for each model
            else{                
                if ($index <= 100) {
                    // Parse the data to an array
                    $dataArray = str_getcsv($line, ",", '"');

                    // Get a data model
                    $dataModel = $this->createCarrierModel(array_combine($this->keys, $dataArray));

                    // Store the data
                    $this->storeData($dataModel);
                }
                else{
                    break;
                }
            }   
        }
    }

    // Return a model for the data
    protected function createCarrierModel($dataArray){

        $carrier = Carrier::firstOrNew($dataArray);

        return $carrier;
    }

    // Store the carrier data in the database
    protected function storeData($data){    
        $data->save();
    }
}

This works perfectly...that is while I'm limiting the function to 100 inserts. If I remove this check and allow it to run this function over the entire 2 million data sets, it no longer works. Either there is a timeout, or if I remove the timeout via something like ini_set('max_execution_time', 6000); I eventually get a "failed to respond" message from the browser.

My assumption is that there needs to be some sort of chunking in place, but I'm honestly not sure of the best approach for handling this volume.

Thank you in advance for any suggestions you may have.

回答1:

I would create an artisan command who handles the import rather than doing this via the browser. Do you like to let the user wait until this big file is imported? What happens if he moves uses the back button or closes the page?

If you want or need to have some kind of user interaction, like the user uploads the file and clicks on an Import button, push the import to a job queue using e.g. Beanstalk. The aforementioned artisan will be run and import the stuff and if its done, you can send the user an e-mail or a slack notification. If you need some UI interaction you can make the request via ajax and that script makes request to an API endpoint requesting the status of the import or since its asynchron, waiting for completion and shows some UI notification, stops a spinner or in error case, shows an error message.