I need to represent content in a lingua franca, that is, in nowadays, the HTML5 standard — my objective is not to show a page in the web-browser. I need to represent only content, no interface, no layout, no logic (no Javascript).
As remembered in other questions (or programmers questions), and the W3C HTML5's Recommendation, "HTML vs XHTML" section,
the DOM, the HTML syntax, and the XHTML syntax cannot all represent the same content.
Ok, but ~90% can be the same (!), and, if I not need Javascript, Styles, etc. and I can enforce some constraints, it will be 100%... So, the question is about what constraints I need to use (?) to ensure that all HTML5 serialized as XHTML5 will be represent the same thing, and vice-versa (an XSLT that will back with the original HTML5 document).
There are a "subset of HTML5 elemements" or a "subset with some aditional constraints" that ensures the reversibility of XHTML5/HTML5 convertions?
Polyglot Markup: A robust profile of the HTML5 vocabulary, which is currently a W3C Candidate Recommendation, defines rules for a document
You can find the rules for writing such a document in section 4: Writing HTML documents.