Here's something I've been having a little bit of difficulty with. I have a local client-side script that needs to allow a user to fetch a remote web page and search that resulting page for forms. In order to do this (without regex), I need to parse the document into a fully traversable DOM object.
Some limitations I'd like to stress:
- I don't want to use libraries (like jQuery). There's too much bloat for what I need to do here.
- Under no circumstances should scripts from the remote page be executed (for security reasons).
- DOM APIs, such as
getElementsByTagName
, need to be available. - It only needs to work in Internet Explorer, but in 7 at the very least.
- Let's pretend I don't have access to a server. I do, but I can't use it for this.
What I've tried
Assuming I have a complete HTML document string (including DOCTYPE declaration) in the variable html
, here's what I've tried so far:
var frag = document.createDocumentFragment(),
div = frag.appendChild(document.createElement("div"));
div.outerHTML = html;
//-> results in an empty fragment
div.insertAdjacentHTML("afterEnd", html);
//-> HTML is not added to the fragment
div.innerHTML = html;
//-> Error (expected, but I tried it anyway)
var doc = new ActiveXObject("htmlfile");
doc.write(html);
doc.close();
//-> JavaScript executes
I've also tried extracting the <head>
and <body>
nodes from the HTML and adding them to a <HTML>
element inside the fragment, still no luck.
Does anyone have any ideas?
Assuming the HTML is valid XML too, you may use loadXML()