How to disable Javascript in mshtml.HTMLDocument (

2019-06-20 07:17发布

I've got a code like this :

Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
iDoc.write(html)
iDoc.close()

However when I load an HTML like this it executes all Javascripts in it as well as doing request to some resources from "html" code.

I want to disable javascript and all other popups (such as certificate error).

My aim is to use DOM from mshtml document to extract some tags from the HTML in a reliable way (instead of bunch of regexes).

Or is there another IE/Office DLL which I can just load an HTML wihtout thinking about IE related popups or active scripts?

4条回答
Luminary・发光体
2楼-- · 2019-06-20 07:43
Dim Document As New mshtml.HTMLDocument
Dim iDoc As mshtml.IHTMLDocument2 = CType(Document, mshtml.IHTMLDocument2)
'add this code
iDoc.designMode="On"
iDoc.write(html)iDoc.close()
查看更多
可以哭但决不认输i
3楼-- · 2019-06-20 07:43

If I remember correctly MSHTML automatically inherits the settings of IE.

So if you disable javascript in internet explorer for the user that is executing the code then Javascript shouldn't run in MSHTML either.

查看更多
孤傲高冷的网名
4楼-- · 2019-06-20 07:48

It sounds like you're screenscraping some resource, then trying to programmatically do something w/ the resulting HTML?

If you know it is valid XHTML ahead of time, then load the XHTML string (which is really XML) into an XmlDocument object, and work with it that way.

Otherwise, if it is potentially invalid, or not properly formed, HTML then you'll need something like hpricot (but that is a Ruby library)

查看更多
萌系小妹纸
5楼-- · 2019-06-20 07:57

If you have the 'html' as a string already, and you just want access to the DOM view of it, why "render" it to a browser control at all?

I'm not familiar with .Net technology, but there has to be some sort of StringToDOM/StringToJSON type of thing that would better suit your needs.

Likewise, if the 'html' variable you are using above is a URL, then just use wget or similar to retrieve the markup as a string, and parse with an applicable tool.

I'd look for a .Net XML/DOM library and use that. (again, I would figure that this would be part of the language, but I'm not sure)

PS after a quick Google I found this (source). Not sure if it would help, if you were to use this in your HTMLDocument instead.

    if(typeof(DOMParser) == 'undefined') {
      DOMParser = function() {}
      DOMParser.prototype.parseFromString = function(str, contentType) {
      if(typeof(ActiveXObject) != 'undefined') {
        var xmldata = new ActiveXObject('MSXML.DomDocument');
        xmldata.async = false;
        xmldata.loadXML(str);
        return xmldata;
     } else if(typeof(XMLHttpRequest) != 'undefined') {
        var xmldata = new XMLHttpRequest;
        if(!contentType) {
          contentType = 'application/xml';
        }
        xmldata.open('GET', 'data:' + contentType + ';charset=utf-8,' + encodeURIComponent(str), false);
        if(xmldata.overrideMimeType) {
          xmldata.overrideMimeType(contentType);
        }
        xmldata.send(null);
        return xmldata.responseXML;
     }
  }
}
查看更多
登录 后发表回答