Determine the number of lines within a text file

2019-01-02 17:16发布

Is there an easy way to programmatically determine the number of lines within a text file?

11条回答
永恒的永恒
2楼-- · 2019-01-02 17:46

You could quickly read it in, and increment a counter, just use a loop to increment, doing nothing with the text.

查看更多
爱死公子算了
3楼-- · 2019-01-02 17:46

Reading a file in and by itself takes some time, garbage collecting the result is another problem as you read the whole file just to count the newline character(s),

At some point, someone is going to have to read the characters in the file, regardless if this the framework or if it is your code. This means you have to open the file and read it into memory if the file is large this is going to potentially be a problem as the memory needs to be garbage collected.

Nima Ara made a nice analysis that you might take into consideration

Here is the solution proposed, as it reads 4 characters at a time, counts the line feed character and re-uses the same memory address again for the next character comparison.

private const char CR = '\r';  
private const char LF = '\n';  
private const char NULL = (char)0;

public static long CountLinesMaybe(Stream stream)  
{
    Ensure.NotNull(stream, nameof(stream));

    var lineCount = 0L;

    var byteBuffer = new byte[1024 * 1024];
    const int BytesAtTheTime = 4;
    var detectedEOL = NULL;
    var currentChar = NULL;

    int bytesRead;
    while ((bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
    {
        var i = 0;
        for (; i <= bytesRead - BytesAtTheTime; i += BytesAtTheTime)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 1];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 2];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 3];
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
                i -= BytesAtTheTime - 1;
            }
        }

        for (; i < bytesRead; i++)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
            }
        }
    }

    if (currentChar != LF && currentChar != CR && currentChar != NULL)
    {
        lineCount++;
    }
    return lineCount;
}

Above you can see that a line is read one character at a time as well by the underlying framework as you need to read all characters to see the line feed.

If you profile it as done bay Nima you would see that this is a rather fast and efficient way of doing this.

查看更多
后来的你喜欢了谁
4楼-- · 2019-01-02 17:47

The easiest:

int lines = File.ReadAllLines("myfile").Length;
查看更多
只若初见
5楼-- · 2019-01-02 17:48

You can launch the "wc.exe" executable (comes with UnixUtils and does not need installation) run as an external process. It supports different line count methods (like unix vs mac vs windows).

查看更多
萌妹纸的霸气范
6楼-- · 2019-01-02 17:51

Seriously belated edit: If you're using .NET 4.0 or later

The File class has a new ReadLines method which lazily enumerates lines rather than greedily reading them all into an array like ReadAllLines. So now you can have both efficiency and conciseness with:

var lineCount = File.ReadLines(@"C:\file.txt").Count();

Original Answer

If you're not too bothered about efficiency, you can simply write:

var lineCount = File.ReadAllLines(@"C:\file.txt").Length;

For a more efficient method you could do:

var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

Edit: In response to questions about efficiency

The reason I said the second was more efficient was regarding memory usage, not necessarily speed. The first one loads the entire contents of the file into an array which means it must allocate at least as much memory as the size of the file. The second merely loops one line at a time so it never has to allocate more than one line's worth of memory at a time. This isn't that important for small files, but for larger files it could be an issue (if you try and find the number of lines in a 4GB file on a 32-bit system, for example, where there simply isn't enough user-mode address space to allocate an array this large).

In terms of speed I wouldn't expect there to be a lot in it. It's possible that ReadAllLines has some internal optimisations, but on the other hand it may have to allocate a massive chunk of memory. I'd guess that ReadAllLines might be faster for small files, but significantly slower for large files; though the only way to tell would be to measure it with a Stopwatch or code profiler.

查看更多
登录 后发表回答