Is StreamReader.Readline() really the fastest meth

2019-02-04 19:17发布

While looking around for a while I found quite a few discussions on how to figure out the number of lines in a file.

For example these three:
c# how do I count lines in a textfile
Determine the number of lines within a text file
How to count lines fast?

So, I went ahead and ended up using what seems to be the most efficient (at least memory-wise?) method that I could find:

private static int countFileLines(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
        int i = 0;
        while (r.ReadLine() != null) 
        { 
            i++; 
        }
        return i;
    }
}

But this takes forever when the lines themselves from the file are very long. Is there really not a faster solution to this?

I've been trying to use StreamReader.Read() or StreamReader.Peek() but I can't (or don't know how to) make the either of them move on to the next line as soon as there's 'stuff' (chars? text?).

Any ideas please?


CONCLUSION/RESULTS (After running some tests based on the answers provided):

I tested the 5 methods below on two different files and I got consistent results that seem to indicate that plain old StreamReader.ReadLine() is still one of the fastest ways... To be honest, I'm perplexed after all the comments and discussion in the answers.

File #1:
Size: 3,631 KB
Lines: 56,870

Results in seconds for File #1:
0.02 --> ReadLine method.
0.04 --> Read method.
0.29 --> ReadByte method.
0.25 --> Readlines.Count method.
0.04 --> ReadWithBufferSize method.

File #2:
Size: 14,499 KB
Lines: 213,424

Results in seconds for File #1:
0.08 --> ReadLine method.
0.19 --> Read method.
1.15 --> ReadByte method.
1.02 --> Readlines.Count method.
0.08 --> ReadWithBufferSize method.

Here are the 5 methods I tested based on all the feedback I received:

private static int countWithReadLine(string filePath)
{
    using (StreamReader r = new StreamReader(filePath))
    {
    int i = 0;
    while (r.ReadLine() != null)
    {
        i++;
    }
    return i;
    }
}

private static int countWithRead(string filePath)
{
    using (StreamReader _reader = new StreamReader(filePath))
    {
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == 10)
        {
        count++;
        }
    }
    return count;
    }            
}

private static int countWithReadByte(string filePath)
{
    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    int b;

    b = s.ReadByte();
    while (b >= 0)
    {
        if (b == 10)
        {
        i++;
        }
        b = s.ReadByte();
    }
    return i;
    }
}

private static int countWithReadLinesCount(string filePath)
{
    return File.ReadLines(filePath).Count();
}

private static int countWithReadAndBufferSize(string filePath)
{
    int bufferSize = 512;

    using (Stream s = new FileStream(filePath, FileMode.Open))
    {
    int i = 0;
    byte[] b = new byte[bufferSize];
    int n = 0;

    n = s.Read(b, 0, bufferSize);
    while (n > 0)
    {
        i += countByteLines(b, n);
        n = s.Read(b, 0, bufferSize);
    }
    return i;
    }
}

private static int countByteLines(byte[] b, int n)
{
    int i = 0;
    for (int j = 0; j < n; j++)
    {
    if (b[j] == 10)
    {
        i++;
    }
    }

    return i;
}

7条回答
放荡不羁爱自由
2楼-- · 2019-02-04 19:28

No, it is not. Point is - it materializes the strings, which is not needed.

To COUNT it you are much better off to ignore the "string" Part and to go the "line" Part.

a LINE is a seriees of bytes ending with \r\n (13, 10 - CR LF) or another marker.

Just run along the bytes, in a buffered stream, counting the number of appearances of your end of line marker.

查看更多
我想做一个坏孩纸
3楼-- · 2019-02-04 19:29

Yes, reading lines like that is the fastest and easiest way in any practical sense.

There are no shortcuts here. Files are not line based, so you have to read every single byte from the file to determine how many lines there are.

As TomTom pointed out, creating the strings is not strictly needed to count the lines, but a vast majority of the time spent will be waiting for the data to be read from the disk. Writing a much more complicated algorithm would perhaps shave off a percent of the execution time, and it would dramatically increase the time for writing and testing the code.

查看更多
Anthone
4楼-- · 2019-02-04 19:30

The best way to know how to do this fast is to think about the fastest way to do it without using C/C++.

In assembly there is a CPU level operation that scans memory for a character so in assembly you would do the following

  • Read big part (or all) of the file into memory
  • Execute the SCASB command
  • Repeat as needed

So, in C# you want the compiler to get as close to that as possible.

查看更多
【Aperson】
5楼-- · 2019-02-04 19:34
public static int CountLines(Stream stm)
{
    StreamReader _reader = new StreamReader(stm);
    int c = 0, count = 0;
    while ((c = _reader.Read()) != -1)
    {
        if (c == '\n')
        {
            count++;
        }
    }
    return count;
}
查看更多
叼着烟拽天下
6楼-- · 2019-02-04 19:38

I tried multiple methods and tested their performance:

The one that reads a single byte is about 50% slower than the other methods. The other methods all return around the same amount of time. You could try creating threads and doing this asynchronously, so while you are waiting for a read you can start processing a previous read. That sounds like a headache to me.

I would go with the one liner: File.ReadLines(filePath).Count(); it performs as well as the other methods I tested.

        private static int countFileLines(string filePath)
        {
            using (StreamReader r = new StreamReader(filePath))
            {
                int i = 0;
                while (r.ReadLine() != null)
                {
                    i++;
                }
                return i;
            }
        }

        private static int countFileLines2(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                int b;

                b = s.ReadByte();
                while (b >= 0)
                {
                    if (b == 10)
                    {
                        i++;
                    }
                    b = s.ReadByte();
                }
                return i + 1;
            }
        }

        private static int countFileLines3(string filePath)
        {
            using (Stream s = new FileStream(filePath, FileMode.Open))
            {
                int i = 0;
                byte[] b = new byte[bufferSize];
                int n = 0;

                n = s.Read(b, 0, bufferSize);
                while (n > 0)
                {
                    i += countByteLines(b, n);
                    n = s.Read(b, 0, bufferSize);
                }
                return i + 1;
            }
        }

        private static int countByteLines(byte[] b, int n)
        {
            int i = 0;
            for (int j = 0; j < n; j++)
            {
                if (b[j] == 10)
                {
                    i++;
                }
            }

            return i;
        }

        private static int countFileLines4(string filePath)
        {
            return File.ReadLines(filePath).Count();
        }
查看更多
贪生不怕死
7楼-- · 2019-02-04 19:40

There are numerous ways to read a file. Usually, the fastest way is the simplest:

using (StreamReader sr = File.OpenText(fileName))
{
        string s = String.Empty;
        while ((s = sr.ReadLine()) != null)
        {
               //do what you gotta do here
        }
}

This page does a great performance comparison between several different techniques including using BufferedReaders, reading into StringBuilder objects, and into an entire array.

查看更多
登录 后发表回答