I tried unxutils' wc -l
but it crashed for 1GB files. I tried this C# code
long count = 0;
using (StreamReader r = new StreamReader(f))
{
string line;
while ((line = r.ReadLine()) != null)
{
count++;
}
}
return count;
It reads a 500MB file in 4 seconds
var size = 256;
var bytes = new byte[size];
var count = 0;
byte query = Convert.ToByte('\n');
using (var stream = File.OpenRead(file))
{
int many;
do
{
many = stream.Read(bytes, 0, size);
count += bytes.Where(a => a == query).Count();
} while (many == size);
}
Reads in 10 seconds
var count = 0;
int query = (int)Convert.ToByte('\n');
using (var stream = File.OpenRead(file))
{
int current;
do
{
current = stream.ReadByte();
if (current == query)
{
count++;
continue;
}
} while (current!= -1);
}
Takes 7 seconds
Is anything faster I haven't tried yet?
File.ReadLines
was introduced in .NET 4.0works in 4 seconds, the same time as the first code snippet
If you really want fast, consider C code.
If this is a command-line utility, it will be faster because it won't have to initialize the CLR or .NET. And, it won't reallocate a new string for each line read from the file, which probably saves time on throughput.
I don't have any files with 1g lines, so I cannot compare. you can try, though:
Your first approach does look like the optimal solution already. Keep in mind that you're mostly not CPU bound but limited by the HD's read speed, which at 500MB / 4sec = 125MB/s is already quite fast. The only way to get faster than that is via RAID or using SSDs, not so much via a better algorithm.
I think that your answer looks good. The only thing I would add is to play with buffer size. I feel that it may change the performance depending on your buffer size.
Please refer to buffer size at - Optimum file buffer read size?
Are you just looking for a tool to count lines in a file, and efficiently? If so, try MS LogParser
Something like below will give you number of lines:
Have you tried flex?
Just compile with:
It accepts input on stdin and outputs the number of lines.