A JavaScript parser for DOM

2019-01-23 20:03发布

We have a special requirement in a project where we have to parse a string of HTML (from an AJAX response) client side via JavaScript only. Thats right no parsing in PHP or Java! I've been going through StackOverflow, this entire week and have yet not got an acceptable solution.

Some more details on the requirements:

  • We can use any library (preferably dojo and / or jQuery) or go native!

  • We need to parse an Entire HTML Document that we receive as a string, including the <head> and <body>.

  • We also need to serialise out the parsed DOM structures to strings at times.

  • Finally, We don't want to append the parsed DOM to the current Document. Rather, we'll send it back to the server for permanent storage.

Eg: We need something like

var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>');
    dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title";

With my research, these are our options:

  1. A TinyMCE Parser. Problem? We need to necessarily include an editor I think. How about for parsing HTML where we don't need an editor?

  2. John Resig's Parser. Should be our best bet. Unfortunately, the parser is crashing when the entire contents of a page is given to it!

  3. The jQuery $(htmlString) or the dojo.toDom(htmlString). Both rely on DocumentFragment and hence gobble up <head> and <body>!

EDIT: We want to serialize the HTML so we may catch certain custom HTML Commnets via RegExp. We need to give users the opportunity to edit meta tags, title tags etc hence the HTML Parser.

Oh and I feel I will be murdered in Stack Overflow even if I just hint at parsing HTML via RegExp!!!

5条回答
孤傲高冷的网名
2楼-- · 2019-01-23 20:09

If you want a full parser that isn't relying some existing thing in the browser to bootstrap your interpreter, the HTML parser in dom.js is top notch. It's entire purpose is to parse html for use in a javascript hosted DOM, so it has to cater to both the DOM specifications as well as the need to parse and use the results in js, all while not assuming any existing tools besides base JS. It works in node.js or spidermonkey's jsshell or webworkers even. https://github.com/andreasgal/dom.js

It also has the serialization part, but to do that you'll need to commit to using more than just the parser part. You can find standalone serializers though that work with any DOM like structure.

查看更多
爷、活的狠高调
3楼-- · 2019-01-23 20:14

I would suggest a 2-part solution whereby you read off the tags that jQuery will not parse for you, and then pass the remainder into jQuery. If you're looking for a pure-javascript solution to parse HTML data structure, jQuery is probably your best bet as it has many built-in functions to manipulate the data. You could actually build your plugin as a jQuery plugin which could be called via: $.parser or something of that nature. If you extend jQuery with your own function to parse the data, you can also return an extended jQuery object that contains functions to read specific data elements even from the header since you can manually parse the ... information and store it in the same object.

查看更多
Melony?
4楼-- · 2019-01-23 20:17

You can leverage the current document without appending any nodes to it.

Try something like this:

function toNode(html) {
    var doc = document.createElement('html');
    doc.innerHTML = html;
    return doc;
}

var node = toNode('<html><head><title> This is the old title. </title></head></html>');

console.log(node);​

http://jsfiddle.net/6SvqA/3/

查看更多
forever°为你锁心
5楼-- · 2019-01-23 20:22

Since HTML essentially is XML you can use jquery parseXML

var dom = $.parseXML(html);

$('title', dom).text("This is a new Title");

Edit:

If you want to get it back into a string you will need to use the xml plugin, but I cannot find its original source so here it is:

/**
 * jQuery xml plugin
 * Converts XML node(s) to string 
 *
 * Copyright (c) 2009 Radim Svoboda
 * Dual licensed under the MIT (MIT-LICENSE.txt)
 * and GPL (GPL-LICENSE.txt) licenses.
 *
 * @author  Radim Svoboda, user Zzzzzz
 * @version 1.0.0
 */


/**
 * Converts XML node(s) to string using web-browser features.
 * Similar to .html() with HTML nodes 
 * This method is READ-ONLY.
 *  
 * @param all set to TRUE (1,"all",etc.) process all elements,
 * otherwise process content of the first matched element 
 *  
 * @return string obtained from XML node(s)  
 */         
jQuery.fn.xml = function(all) {

  //result to return
  var s = "";

   //Anything to process ?
   if( this.length )

    //"object" with nodes to convert to string  
   (
      ( ( typeof all != 'undefined' ) && all ) ?
      //all the nodes 
      this 
      :
      //content of the first matched element 
      jQuery(this[0]).contents()
    )
   //convert node(s) to string  
   .each(function(){
    s += window.ActiveXObject ?//==  IE browser ?
       //for IE
         this.xml
         :
         //for other browsers
         (new XMLSerializer()).serializeToString(this)
         ;
  }); 


  return    s;      

  };
查看更多
老娘就宠你
6楼-- · 2019-01-23 20:32

I do not know why anybody should need this, but I suggest you simply dump your source into an iframe. The browser can do the parsing for you. You can even run DOM queries on the result.

查看更多
登录 后发表回答