Get-Content -wait not working as described in the

2019-01-12 02:05发布

问题:

I've noticed that when Get-Content path/to/logfile -Wait, the output is actually not refreshed every second as the documentation explains it should. If I go in Windows Explorer to the folder where the log file is and Refresh the folder, then Get-Content would output the latest changes to the log file.

If I try tail -f with cygwin on the same log file (not at the same time than when trying get-content), then it tails as one would expect, refreshing real time without me having to do anything.

Does anyone have an idea why this happens?

回答1:

Edit: Bernhard König reports in the comments that this has finally been fixed in Powershell 5.

You are quite right. The -Wait option on Get-Content waits until the file has been closed before it reads more content. It is possible to demonstrate this in Powershell, but can be tricky to get right as loops such as:

while (1){
get-date | add-content c:\tesetfiles\test1.txt 
Start-Sleep -Milliseconds 500
}

will open and close the output file every time round the loop.

To demonstrate the issue open two Powershell windows (or two tabs in the ISE). In one enter this command:

PS C:\> 1..30 | % { "${_}: Write $(Get-Date -Format "hh:mm:ss")"; start-sleep 1 } >C:\temp\t.txt

That will run for 30 seconds writing 1 line into the file each second, but it doesn't close and open the file each time.

In the other window use Get-Content to read the file:

get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }

With the -Wait option you need to use Ctrl+C to stop the command so running that command 3 times waiting a few seconds after each of the first two and a longer wait after the third gave me this output:

PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
8: Write 12:15:09 read at 12:15:09

PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
13: Write 12:15:14 read at 12:15:15

PS C:\> get-content c:\temp\t.txt -tail 1 -wait | % { "$_ read at $(Get-Date -Format "hh:mm:ss")" }
19: Write 12:15:20 read at 12:15:20
20: Write 12:15:21 read at 12:15:32
21: Write 12:15:22 read at 12:15:32
22: Write 12:15:23 read at 12:15:32
23: Write 12:15:24 read at 12:15:32
24: Write 12:15:25 read at 12:15:32
25: Write 12:15:26 read at 12:15:32
26: Write 12:15:27 read at 12:15:32
27: Write 12:15:28 read at 12:15:32
28: Write 12:15:29 read at 12:15:32
29: Write 12:15:30 read at 12:15:32
30: Write 12:15:31 read at 12:15:32

From this I can clearly see:

  1. Each time the command is run it gets the latest line written to the file. i.e. There is no problem with caching and no buffers needing flushed.
  2. Only a single line is read and then no further output appears until the command running in the other window completes.
  3. Once it does complete all of the pending lines appear together. This must have been triggered by the source program closing the file.

Also when I repeated the exercise with the Get-Content command running in two other windows one window read line 3 then just waited, the other window read line 6, so the line is definitely being written to the file.

It seems pretty conclusive that the -Wait option is waiting for a file close event, not waiting for the advertised 1 second. The documentation is wrong.

Edit: I should add, as Adi Inbar seems to insistent that I'm wrong, that the examples I gave here use Powershell only as that seemed most appropriate for a Powershell discussion. I did also verify using Python that the behaviour is exactly as I described:

Content written to a file is readable by a new Get-Content -Wait command immediately provided the application has flushed its buffer.

A Powershell instance using Get-Content -Wait will not display new content in the file that is being written even though another Powershell instance, started later, sees the later data. This proves conclusively that the data is accessible to Powershell and Get-Content -Wait is not polling at 1 second intervals but waiting for some trigger event before it next looks for data.

The size of the file as reported by dir is updating while lines are being added, so it is not a case of Powershell waiting for the directory entry size to be updated.

When the process writing the file closes it, the Get-Content -Wait displays the new content almost instantly. If it were waiting until the data was flushed to disk there would be up to a delay until Windows flushed it's disk cache.

@AdiInbar, I'm afraid you don't understand what Excel does when you save a file. Have a closer look. If you are editing test.xlsx then there is also a hidden file ~test.xlsx in the same folder. Use dir ~test.xlsx -hidden | select CreationTime to see when it was created. Save your file and now test.xlsx will have the creation time from ~test.xlsx. In other words saving in Excel saves to the ~ file then deletes the original, renames the ~ file to the original name and creates a new ~ file. There's a lot of opening and closing going on there.

Before you save it has the file you are looking at open, and after that file is open, but its a different file. I think Excel is too complex a scenario to say exactly what triggers Get-Content to show new content but I'm sure you mis-interpreted it.



回答2:

It looks like Powershell is monitoring the file's Last Modified property. The problem is that "for performance reasons" the NTFS metadata containing this property is not automatically updated except under certain circumstances.

One cirumstance is when the file handle is closed (hence @Duncan's observations). Another is when the file's information is queried directly, hence the Explorer refresh behaviour mentioned in the question.

You can observe the correlation by having Powershell monitoring a log with Get-Content -Wait and having Explorer open in the folder in details view with Last Modified column visible. Notice that Last Modified doesn't update automatically as the file is modified.

Now get the properties of the file in another window. E.g. at a command prompt, type the file. Or open another Explorer window in the same folder, and right-click the file and get its properties (for me, just right-clicking is enough). As soon as you do that, the first Explorer window will automatically update the Last Modified column and Powershell will notice the update and catch up with the log. In Powershell, touching the LastWriteTime property is enough:

(Get-Item file.log).LastWriteTime = (Get-Item file.log).LastWriteTime

or

(Get-Item file.log).LastWriteTime = Get-Date

So this is now working for me:

Start-Job {
  $f=Get-Item full\path\to\log
  while (1) {
    $f.LastWriteTime = Get-Date
    Start-Sleep -Seconds 10
  }
}
Get-Content path\to\log -Wait


回答3:

Can you tell us how to reproduce that?

I can start this script on one PS session:

get-content c:\testfiles\test1.txt -wait

and this in another session:

while (1){
get-date | add-content c:\tesetfiles\test1.txt 
Start-Sleep -Milliseconds 500
}

And I see the new entries being written in the first session.



回答4:

It appears that get-content only works if it goes through the windows api and that versions of appending to a file are different.

program.exe > output.txt

And then

get-content output.txt -wait

Will not update. But

program.exe | add-content output.txt

will work with.

get-content output.txt -wait    

So I guess it depends on how the application does output.



回答5:

I can assure you that Get-Content -Wait does refresh every second, and shows you changes when the file changes on the disk. I'm not sure what tail -f is doing differently, but based on your description I'm just about certain that this issue is not with PowerShell but with write caching. I can't rule out the possibility that log4net is doing the caching, but I strongly suspect that OS-level caching is the culprit, for two reasons:

  1. The documentation for log4j/log4net says that it flushes the buffer after every append operation by default, and I presume that if you had explicitly configured it not to flush after every append, you'd be aware of that.
  2. I know for a fact that refreshing Windows Explorer triggers a write buffer flush if any files in the directory have changed. That's because it actually reads the file contents, not just the metadata, in order to provide extended information such as thumbnails and previews, and the read operation causes the write buffer to flush. So, if you're seeing the delayed updates every time you refresh the logfile's directory in Windows Explorer, that points strongly in this direction.

Try this: Open Device Manager, expand the Disk Drives node, open the Properties of the disk on which the logfile is stored, switch to the Policies tab, and uncheck Enable write caching on the device. I think you'll find that Get-Content -Wait will now show you the changes as they happen.

As for why tail -f is showing you the changes immediately as it is, I can only speculate. Maybe you're using it to monitor a logfile on a different drive, or perhaps Cygwin requests frequent flushes while you're running tail -f, to address this very issue.


UPDATE:

Duncan commented below that it is an issue with PowerShell, and posted an answer contending that Get-Content -Wait doesn't output new results until the file is closed, contrary to the documentation.

However, based on information already established and further testing, I've confirmed conclusively that it does not wait for the file to be closed, but outputs new data added to the file as soon as it's written to disk, and that the issue the OP is seeing is almost definitely due to write buffering.

To prove this, let the facts be submitted to a candid world:

  • I created an Excel spreadsheet, and ran Get-Content -Wait against the .xlsx file. When I entered new data into the spreadsheet, the Get-Content -Wait did not produce new output, which is expected while the new information is only in RAM and not on disk. However, whenever I saved the spreadsheet after adding data, new output was produced immediately.

    Excel does not close the file when you save it. The file remains open until you close the Window from Excel, or exit Excel. You can verify this by trying to delete, rename, or otherwise modify the .xlsx file after you've saved it, while the window is still open in Excel.

  • The OP stated that he gets new output when he refreshes the folder in Windows Explorer. Refreshing the folder listing does not close the file. It does flush the write buffer if any of the files have changed. That's because it has to read the file's attributes, and this operation flushes the write buffer. I'll try to find some references for this, but as I noted above, I know for a fact that this is true.

  • I verified this behavior by running the following modified version of Duncan's test, which runs for 1,000 iterations instead of 50, and displays progress at the console so that you can track exactly how the output in your Get-Content -Wait window relates to the data that the pipeline has added to the file:

    1..1000 | %{"${_}: Write $(Get-Date -Format "hh:mm:ss")"; Write-Host -NoNewline "$_..."; Start-Sleep 1} > .\gcwtest.txt
    

    While this was running, I ran Get-Content -Wait .\gcwtest.txt in another window, and opened the directory in Windows Explorer. I found that if I refresh, more output is produced any time the file size in KB changes, and sometimes but not always even if nothing visible has changed. (More on the implications of that inconsistency later...)

  • Using the same test, I opened a third PowerShell window, and observed that all of the following trigger an immediate update in the Get-Content -Wait listing:

    • Listing the file's contents with plain old Get-Content .\gcwtest.txt

    • Reading any of the file's attributes. However, for attributes that don't change, only the first read triggers an update.

      For example, (gi .\gcwtest.txt).lastwritetime triggers more output multiple times. On the other hand, (gi .\gcwtest.txt).mode or (gi .\gcwtest.txt).directory trigger more output the first time each, but not if you repeat them. Also note the following:

      »   This behavior is not 100% consistent. Sometimes, reading Mode or Directory doesn't trigger more output the first time, but it does if you repeat the operation. All subsequent repetitions after the first one that triggers updated output have no effect.

      »   If you repeat the test, reading attributes that are the same does not trigger output, unless you delete the .txt file before running the pipeline again. In fact, sometimes even (gi .\gcwtest.txt).lastwritetime doesn't trigger more output if you repeat the test without deleting gcwtest.txt.

      »   If you issue (gi .\gcwtest.txt).lastwritetime multiple times in one second, only the first one triggers output, i.e. only when the result has changed.

    • Opening the file in a text editor. If you use an editor that keeps the file handle open (notepad does not), you'll see that closing the file without saving does not cause Get-Content -Wait to output the lines added by the pipeline since you opened the file in the editor.

    • Tab-completing the file's name

  • After you try any of the tests above a few times, you many find that Get-Content -Wait outputs more lines periodically for the remainder of the pipeline's execution, even if you don't do anything. Not one line at a time, but in batches.

  • The inconsistency in behavior itself points to buffer flushing, which occurs according to variable criteria that are hard to predict, as opposed to closing, which occurs under clear-cut and consistent circumstances.

Conclusion: Get-Content -Wait works exactly as advertised. New content is displayed as soon as it's physically written to the file on disk*.

It should be noted that my suggestion to disable write caching on the drive did not for the test above, i.e. it did not result in `Get-Content -Wait displaying new lines as soon as they're added to the text file by the pipeline, so perhaps the buffering responsible for the output latency is occurring on a filesystem or OS level as opposed to the disk's write cache. However, write buffering is clearly the explanation for the behavior observed in the OP's question.

* I'm not going to get into this in detail, since it's out of the scope of the question, but Get-Content -Wait does behave oddly if you add content to the file not at the end. It displays data from the end of the file equal in size to the amount of data added. The newly displayed data generally repeats data that was previously displayed, and may or may not include any of the new data, depending on whether the size of the new data exceeds the size of the data that follows it.



回答6:

I ran in to the same issue while trying to watch WindowsUpdate.log in realtime. While not ideal, the code below allowed me to monitor the progress. -Wait didn't work due to the same file-writing limitations discussed above.

Displays the last 10 lines, sleeps for 10 seconds, clears the screen and then displays the last 10 again. CTRL + C to stop stream.

 while(1){
Get-Content C:\Windows\WindowsUpdate.log -tail 10 
    Start-Sleep -Seconds 10
    Clear 
    }