InternetExplorer.Application com object and window

2019-08-09 04:12发布

问题:

I am trying to access the document of an internet explorer com object with windows 2012. The code works great in windows 2008 but as soon as I try to run it on windows 2012 (fresh install, tried on more than one server), the same code stops working. In other words, $ie.document.documentHtml returns as null.

Below is the code:

$ie = new-object -com "InternetExplorer.Application"
$ie.navigate2("http://www.example.com/") 
while($ie.busy) {start-sleep 1}
$ie.document.documentHtml.innerhtml

Has the interexplorer com object changed in windows 2012? and if yes, how do I do I retrieve the document contents in windows 2012?

Thanks in advance

edit: Added a bounty to sweeten things up. Invoke-WebRequest is nice but it works only on windows 2012 but I need to use internet explorer and have it work both on windows 2008 and windows 2012. I have read somewhere that installing microsoft office solves the issue. It is not an option either.

edit2: as I need to remotely invoke the script on multiple windows server (both 2008 and 2012), I would prefer not to copy files manually

回答1:

It's a know bug: http://connect.microsoft.com/PowerShell/feedback/details/764756/powershell-v3-internetexplorer-application-issue

An extract from the workaround:

So, here's a workaround:

  1. Copy Microsoft.html.dll from a location where it is installed (eg: from C:\Program Files(x86)\Microsoft.NET\Primary Interop Assemblies to your script's location (can be a network drive)
  2. Use the Load-Assembly.ps1 script (code provided below and at: http://sdrv.ms/U6j7Wn) to load the assembly types in memory eg: .\Load-Assembly.ps1 -Path .\microsoft.mshtml.dll

Then proceed as usual to create the IE object etc. Warning: when dealing with the write() and writeln() methods use the backward compatible methods: IHTMLDocument2_write() and IHTMLDocument2_writeln().



回答2:

    $ie.document.documentHtml.innerhtml

The bigger question is how this ever could have worked. The Document property returns a reference to the IHTMLDocument interface, it does not have a "documentHtml" property. It is never that clear what you might get back when you use late binding as was done in this code. There is an old documentHtml property supported by the DHTML Editing control, that has been firmly put to the pasture. Admittedly rather a wild guess.

Anyhoo, correct syntax is to use, say, the body property:

  $ie = new-object -com "InternetExplorer.Application"
  $ie.navigate2("http://www.example.com/") 
  while($ie.busy) {start-sleep 1}
  $txt = $ie.document.body.innerhtml
  Write-Output $txt

If you still have problems, Powershell does treat null references rather undiagnosably, then try running this C# code on the machine. Ought to give you a better message:

using System;

class Program {
    static void Main(string[] args) {
        try {
            var comType = Type.GetTypeFromProgID("InternetExplorer.Application");
            dynamic browser = Activator.CreateInstance(comType);
            browser.Navigate2("http://example.com");
            while (browser.Busy) System.Threading.Thread.Sleep(1);
            dynamic doc = browser.Document;
            Console.WriteLine(doc.Body.InnerHtml);
        }
        catch (Exception ex) {
            Console.WriteLine(ex.ToString());
        }
        Console.ReadLine();
    }
}


回答3:

As far as I can tell, on Windows Server 2012 to get the full html of a page:

$ie.document.documentElement.outerhtml

There is also an innerhtml property on the documentElement, which strips off the root <html> element.

Of course, if all you want to do is get the raw markup, consider using Invoke-WebRequest:

$doc = Invoke-WebRequest 'http://www.example.com'
$doc.Content