While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.
For example these three:
c# how do I count lines in a textfile
Determine the number of lines within a text file
How to count lines fast?
So, I went ahead and ended up using what seems to be the most efficient (at least memory-wise?) method that I could find:
private static int countFileLines(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null)
{
i++;
}
return i;
}
}
But this takes forever when the lines themselves from the file are very long. Is there really not a faster solution to this?
I've been trying to use StreamReader.Read()
or StreamReader.Peek()
but I can't (or don't know how to) make the either of them move on to the next line as soon as there's 'stuff' (chars? text?).
Any ideas please?
CONCLUSION/RESULTS (After running some tests based on the answers provided):
I tested the 5 methods below on two different files and I got consistent results that seem to indicate that plain old StreamReader.ReadLine()
is still one of the fastest ways... To be honest, I'm perplexed after all the comments and discussion in the answers.
File #1:
Size: 3,631 KB
Lines: 56,870
Results in seconds for File #1:
0.02 --> ReadLine method.
0.04 --> Read method.
0.29 --> ReadByte method.
0.25 --> Readlines.Count method.
0.04 --> ReadWithBufferSize method.
File #2:
Size: 14,499 KB
Lines: 213,424
Results in seconds for File #1:
0.08 --> ReadLine method.
0.19 --> Read method.
1.15 --> ReadByte method.
1.02 --> Readlines.Count method.
0.08 --> ReadWithBufferSize method.
Here are the 5 methods I tested based on all the feedback I received:
private static int countWithReadLine(string filePath)
{
using (StreamReader r = new StreamReader(filePath))
{
int i = 0;
while (r.ReadLine() != null)
{
i++;
}
return i;
}
}
private static int countWithRead(string filePath)
{
using (StreamReader _reader = new StreamReader(filePath))
{
int c = 0, count = 0;
while ((c = _reader.Read()) != -1)
{
if (c == 10)
{
count++;
}
}
return count;
}
}
private static int countWithReadByte(string filePath)
{
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
int b;
b = s.ReadByte();
while (b >= 0)
{
if (b == 10)
{
i++;
}
b = s.ReadByte();
}
return i;
}
}
private static int countWithReadLinesCount(string filePath)
{
return File.ReadLines(filePath).Count();
}
private static int countWithReadAndBufferSize(string filePath)
{
int bufferSize = 512;
using (Stream s = new FileStream(filePath, FileMode.Open))
{
int i = 0;
byte[] b = new byte[bufferSize];
int n = 0;
n = s.Read(b, 0, bufferSize);
while (n > 0)
{
i += countByteLines(b, n);
n = s.Read(b, 0, bufferSize);
}
return i;
}
}
private static int countByteLines(byte[] b, int n)
{
int i = 0;
for (int j = 0; j < n; j++)
{
if (b[j] == 10)
{
i++;
}
}
return i;
}
StreamReader
is not the fastest way to read files in general because of the small overhead from encoding the bytes to characters, so reading the file in a byte array is faster.The results I get are a bit different each time due to caching and other processes, but here is one of the results I got (in milliseconds) with a 16 MB file :
In general
File.ReadLines
should be a little bit slower than aStreamReader.ReadLine
loop.File.ReadAllBytes
is slower with bigger files and will throw out of memory exception with huge files. The default buffer size forFileStream
is 4K, but on my machine 64K seemed the fastest.and tested with: