I am trying to access the document of an internet explorer com object with windows 2012. The code works great in windows 2008 but as soon as I try to run it on windows 2012 (fresh install, tried on more than one server), the same code stops working. In other words, $ie.document.documentHtml returns as null.
Below is the code:
$ie = new-object -com "InternetExplorer.Application"
$ie.navigate2("http://www.example.com/")
while($ie.busy) {start-sleep 1}
$ie.document.documentHtml.innerhtml
Has the interexplorer com object changed in windows 2012? and if yes, how do I do I retrieve the document contents in windows 2012?
Thanks in advance
edit: Added a bounty to sweeten things up. Invoke-WebRequest is nice but it works only on windows 2012 but I need to use internet explorer and have it work both on windows 2008 and windows 2012. I have read somewhere that installing microsoft office solves the issue. It is not an option either.
edit2: as I need to remotely invoke the script on multiple windows server (both 2008 and 2012), I would prefer not to copy files manually
It's a know bug: http://connect.microsoft.com/PowerShell/feedback/details/764756/powershell-v3-internetexplorer-application-issue
An extract from the workaround:
So, here's a workaround:
- Copy
Microsoft.html.dll
from a location where it is installed (eg: from C:\Program Files(x86)\Microsoft.NET\Primary Interop Assemblies to your script's location (can be a network drive)
- Use the
Load-Assembly.ps1
script (code provided below and at: http://sdrv.ms/U6j7Wn) to load the assembly types in memory
eg: .\Load-Assembly.ps1 -Path .\microsoft.mshtml.dll
Then proceed as usual to create the IE object etc. Warning: when dealing with the write() and writeln() methods use the backward compatible methods: IHTMLDocument2_write() and IHTMLDocument2_writeln().
$ie.document.documentHtml.innerhtml
The bigger question is how this ever could have worked. The Document
property returns a reference to the IHTMLDocument interface, it does not have a "documentHtml" property. It is never that clear what you might get back when you use late binding as was done in this code. There is an old documentHtml property supported by the DHTML Editing control, that has been firmly put to the pasture. Admittedly rather a wild guess.
Anyhoo, correct syntax is to use, say, the body
property:
$ie = new-object -com "InternetExplorer.Application"
$ie.navigate2("http://www.example.com/")
while($ie.busy) {start-sleep 1}
$txt = $ie.document.body.innerhtml
Write-Output $txt
If you still have problems, Powershell does treat null references rather undiagnosably, then try running this C# code on the machine. Ought to give you a better message:
using System;
class Program {
static void Main(string[] args) {
try {
var comType = Type.GetTypeFromProgID("InternetExplorer.Application");
dynamic browser = Activator.CreateInstance(comType);
browser.Navigate2("http://example.com");
while (browser.Busy) System.Threading.Thread.Sleep(1);
dynamic doc = browser.Document;
Console.WriteLine(doc.Body.InnerHtml);
}
catch (Exception ex) {
Console.WriteLine(ex.ToString());
}
Console.ReadLine();
}
}
As far as I can tell, on Windows Server 2012 to get the full html of a page:
$ie.document.documentElement.outerhtml
There is also an innerhtml
property on the documentElement
, which strips off the root <html>
element.
Of course, if all you want to do is get the raw markup, consider using Invoke-WebRequest
:
$doc = Invoke-WebRequest 'http://www.example.com'
$doc.Content