Get all lines containing a string in a huge text f

2020-06-01 07:05发布

问题:

In Powershell, how to read and get as fast as possible the last line (or all the lines) which contains a specific string in a huge text file (about 200000 lines / 30 MBytes) ? I'm using :

get-content myfile.txt | select-string -pattern "my_string" -encoding ASCII | select -last 1

But it's very very long (about 16-18 seconds). I did tests without the last pipe "select -last 1", but it's the same time.

Is there a faster way to get the last occurence (or all occurences) of a specific string in huge file?

Perhaps it's the needed time ... Or it there any possiblity to read the file faster from the end as I want the last occurence? Thanks

回答1:

Try this:

get-content myfile.txt -ReadCount 1000 |
 foreach { $_ -match "my_string" }

That will read your file in chunks of 1000 records at a time, and find the matches in each chunk. This gives you better performance because you aren't wasting a lot of cpu time on memory management, since there's only 1000 lines at a time in the pipeline.



回答2:

Have you tried:

gc myfile.txt | % { if($_ -match "my_string") {write-host $_}}

Or, you can create a "grep"-like function:

function grep($f,$s) {
    gc $f | % {if($_ -match $s){write-host $_}}
    }

Then you can just issue: grep $myfile.txt $my_string



回答3:

$reader = New-Object System.IO.StreamReader("myfile.txt")

$lines = @()

if ($reader -ne $null) {
    while (!$reader.EndOfStream) {
        $line = $reader.ReadLine()
        if ($line.Contains("my_string")) {
            $lines += $line
        }
    }
}

$lines | Select-Object -Last 1


回答4:

Have you tried using [System.IO.File]::ReadAllLines();? This method is more "raw" than the PowerShell-esque method, since we're plugging directly into the Microsoft .NET Framework types.

$Lines = [System.IO.File]::ReadAllLines();
[Regex]::Matches($Lines, 'my_string_pattern');


标签: powershell