可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Considering i have a 100GB txt file containing millions of lines of text. How could i read this text file by block of lines using PHP?

i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.

If i'll be using fread($fp,5030) wherein '5030' is some length value for which it has to read. Would there be a case where it won't read the whole line(such as stop at the middle of the line) because it has reached the max length?

回答1:

i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.

Don't see, why you shouldn't be able to use fgets()

$blocksize = 50; // in "number of lines"
while (!feof($fh)) {
  $lines = array();
  $count = 0;
  while (!feof($fh) && (++$count <= $blocksize)) {
    $lines[] = fgets($fh);
  }
  doSomethingWithLines($lines);
}

Reading 100GB will take time anyway.

回答2:

The fread approach sounds like a reasonable solution. You can detect whether you've reached the end of a line by checking whether the final character in the string is a newline character ('\n'). If it isn't, then you can either read some more characters and append them to your existing string, or you can trim characters from your string back to the last newline, and then use fseek to adjust your position in the file.

Side point: Are you aware that reading a 100GB file will take a very long time?

回答3:

i think that you have to use fread($fp, somesize), and check manually if you have founded the end of the line, otherwise read another chunk.

Hope this helps.

回答4:

I would recommend implementing the reading of a single line within a function, hiding the implementation details of that specific step from the rest of your code - the processing function must not care how the line was retrieved. You can then implement your first version using fgets() and then try other methods if you notice that it is too slow. It could very well be that the initial implementation is too slow, but the point is: you won't know until you've benchmarked.

回答5:

I know this is an old question, but I think there is value for a new answer for anyone that finds this question eventually.

I agree that reading 100GB takes time, that I why I also agree that we need to find the most effective option to read it so it can be as little as possible instead of just thinking "who cares how much it is if is already a lot", so, lets find out our lowest time possible.