Batching php's fgetcsv

I have a fairly large csv file (at least for the web) that I don't have control of. It has about 100k rows in it, and will only grow larger.

I'm using the Drupal Module Feeds to create nodes based on this data, and their parser batches the parsing in groups of 50 lines. However, their parser doesn't handle quotation marks properly, and fails to parse about 60% of the csv file. fgetcsv works but doesn't batch things as far as I can tell.

While trying to read the entire file with fgetcsv, PHP eventually runs out of memory. Therefore I would like to be able to break things up into smaller chunks. Is this possible?

标签： php drupal csv feeds batch-processing

3条回答

beautiful°

2楼-- · 2019-08-27 02:41

I suspect the problem is the fact that you're storing too much information in memory rather than how you're reading the CSV file off disk. (i.e.: fgetcsv will only read a line at a time, so if a single line's worth of data is causing you to run out of memory you're in trouble.)

As such, you simply need to use an approach where you:

Read 'x' lines into an array.
Process this information
Clear any temporary variables/arrays.
Repeat until FEOF.

Alternatively, you could execute the CSV processing via the command line version of PHP and use a custom php.ini that has a much larger memory limit.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-08-27 02:54

fgetcsv() works by reading one line at a time from a given file pointer. If PHP is running out of memory, perhaps you are trying to parse the whole file at once, putting it all into a giant array. The solution would be to process it line by line without storing it in a big array.

To answer the batching question more directly, read n lines from the file, then use ftell() to find the location in the file where you ended. Make a note of this point, and then you can return to it at some point in the future by calling fseek() before fgetcsv().

0人赞添加讨论(0) 举报

冷血范

4楼-- · 2019-08-27 03:01

Well, create a function to parse a bunch of lines:

function parseLines(array $lines) {
    foreach ($lines as $line) {
        //insert line into new node
    }
}

Then, just batch it up:

$numberOfLinesToBatch = 50;
$f = fopen($file, 'r');
if (!$f) die('implement better error checking');

$buffer = array();
while ($row = fgetcsv($f)) {
    $buffer[] = $row;
    if (count($buffer) >= $numberOfLinesToBatch) {
        parseLines($buffer);
        $buffer = array();
    }
}
if (!empty($buffer)) {
    parseLines(buffer);
}

fclose($f);

It streams the data in, and you can tune how many rows it buffers by tweaking the varariable...

0人赞添加讨论(0) 举报

Batching php's fgetcsv

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间