One of the ways to get number of lines from a file is this method in PowerShell:
PS C:\Users\Pranav\Desktop\PS_Test_Scripts> $a=Get-Content .\sub.ps1
PS C:\Users\Pranav\Desktop\PS_Test_Scripts> $a.count
34
PS C:\Users\Pranav\Desktop\PS_Test_Scripts>
However, when I have a large 800 MB text file, how do I get the line number from it without reading the whole file?
The above method will consume too much RAM resulting in crashing the script or taking too long to complete.
Here's a PowerShell script I cobbled together which demonstrates a few different methods of counting lines in a text file, along with the time and memory required for each method. The results (below) show clear differences in the time and memory requirements. For my tests, it looks like the sweet spot was Get-Content, using a ReadCount setting of 100. The other tests required significantly more time and/or memory usage.
Here are results for text file containing ~95k lines, 104 MB:
Here are results for a larger file (containing ~285k lines, 308 MB):
Here's another solution that uses .NET:
It's not very interruptible, but it's very easy on memory.
Here is something I wrote to trying lessening the memory usage when parsing out the white-space in my txt file. With that said, the memory usage still get kind of high, but the process take less time to run.
Just to give you some background of my file, the file had over 2 millions records and have leading white space in both front and rear of the each line. I believe total time was 5+ minutes.
Here is a one-liner based on Pseudothink's post.
Rows in one specific file:
All files in current dir (individually):
Explanation:
"the_name_of_your_file.txt"
-> does nothing, just provides the filename for next steps, needs to be double quoted|%
-> alias ForEach-Object, iterates over items provided (just one in this case), accepts piped content as an input, current item saved to$_
$n = $_
-> $n as name of the file provided is saved for later from$_
, actually this may not be needed$c = 0
-> initialisation of$c
as countGet-Content -Path $_ -ReadCount 1000
-> read 1000 lines from file provided (see other answers of the thread)|%
-> foreach do add numbers of rows actually read to$c
(will be like 1000 + 1000 + 123)"$n; $c"
-> once finished reading file, print name of file; count of rowsGet-ChildItem "."
-> just adds more items to the pipe than single filename didUse
Get-Content -Read $nLinesAtTime
to read your file part by part:And here is simple, but slow script to validate work on a small file:
The first thing to try is to stream
Get-Content
and build up the line count one at a time, rather that storing all lines in an array at once. I think that this will give proper streaming behavior - i.e. the entire file will not be in memory at once, just the current line.And as the other answer suggests, adding
-ReadCount
could speed this up.If that doesn't work for you (too slow or too much memory) you could go directly to a
StreamReader
: