What is the appropriate way of dealing with large text files in Objective-C? Let's say I need to read each line separately and want to treat each line as an NSString. What is the most efficient way of doing this?
One solution is using the NSString method:
+ (id)stringWithContentsOfFile:(NSString *)path
encoding:(NSStringEncoding)enc
error:(NSError **)error
and then split the lines with a newline separator, and then iterate over the elements in the array. However, this seems fairly inefficient. Is there no easy way to treat the file as a stream, enumerating over each line, instead of just reading it all in at once? Kinda like Java's java.io.BufferedReader.
Just like @porneL said, the C api is very handy.
As others have answered both NSInputStream and NSFileHandle are fine options, but it can also be done in a fairly compact way with NSData and memory mapping:
BRLineReader.h
BRLineReader.m
I found response by @lukaswelte and code from Dave DeLong very helpful. I was looking for a solution to this problem but needed to parse large files by
\r\n
not just\n
.The code as written contains a bug if parsing by more than one character. I've changed the code as below.
.h file:
.m file:
I am adding this because all other answers I tried fell short one way or another. The following method can handle large files, arbitrary long lines, as well as empty lines. It has been tested with actual content and will strip out newline character from the output.
Credit goes to @Adam Rosenfield and @sooop
That's a great question. I think @Diederik has a good answer, although it's unfortunate that Cocoa doesn't have a mechanism for exactly what you want to do.
NSInputStream
allows you to read chunks of N bytes (very similar tojava.io.BufferedReader
), but you have to convert it to anNSString
on your own, then scan for newlines (or whatever other delimiter) and save any remaining characters for the next read, or read more characters if a newline hasn't been read yet. (NSFileHandle
lets you read anNSData
which you can then convert to anNSString
, but it's essentially the same process.)Apple has a Stream Programming Guide that can help fill in the details, and this SO question may help as well if you're going to be dealing with
uint8_t*
buffers.If you're going to be reading strings like this frequently (especially in different parts of your program) it would be a good idea to encapsulate this behavior in a class that can handle the details for you, or even subclassing
NSInputStream
(it's designed to be subclassed) and adding methods that allow you to read exactly what you want.For the record, I think this would be a nice feature to add, and I'll be filing an enhancement request for something that makes this possible. :-)
Edit: Turns out this request already exists. There's a Radar dating from 2006 for this (rdar://4742914 for Apple-internal people).
from @Adam Rosenfield's answer, the formatting string of
fscanf
would be changed like below:it will work in osx, linux, windows line endings.