Use XSLT to read from one HTML source and create a

2019-07-30 01:29发布

I'm trying to learn how to use XSLT to read from one HTML source and create a new HTML page. I know some about using XSLT to read from a XML file and create a new HTML page, but the other way is new for me and I can't find any useful tutorials about the subject.

I'm loking for some basic knowledge about this to be able to start, but I don't know how to think and use XSLT to e.g select divs and it's content from the source HTML and create a new HTML and perhaps create a new page without the head tag and so on.

Preciate some basic help or a good links about this subject. Thanks! :)

Hi again! This is my task and problem that I need some help to solve, if it's possible?! I have one XHTML document that use a CSS stylesheet. Let's call the XHTML document for "B". I want to create a new XHTML document, let's call that "A", and use some of the divs from "B" on "A" with a new CSS stylesheet. It's like if someone click on "B" they would come to "A" instead. Hmmm, and I don't know where to start and I don't know if this is possible? How do I add a CSS stylesheet to the XSLT code? Maybe no one understand what I'm talking about, but don't hesitate to ask. Preciate all help that I can get to solve this task! Thanks in advance! :)

标签: xslt
2条回答
霸刀☆藐视天下
2楼-- · 2019-07-30 01:57

When converting from XHTML to (X)HTML, from the point of view of a processor, you might want first to avoid the external resolution of the parse phase caused by the doctype, as it can be source of runtime errors.

In such a case you should see if your processor supports any options to disable that or you may need to remove the doctype declaration from the input document directly.

For example in msxsl you can use the xe options to disable external doctype resolutions:

> msxsl test_i.xml test_t.xsl -o test_o.xml -xe

From the point of view of XSLT 1.0, your xhtml is just an XML document with a specific namespace. For instance:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title></title>
    </head>
    <body>
        <p>Foo</p>
    </body>
</html>

To be able to convert this to other XHTML document your XSLT must:

  • declare the correct default namespace and prefix
  • declare the correct output and doctype

You will access the elements in the input document using the defined prefix. For example this transform just add an header to the input document:

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:x="http://www.w3.org/1999/xhtml"
    exclude-result-prefixes="x">

    <xsl:output method="html" indent="yes" 
        doctype-public="-//W3C//DTD XHTML 1.1//EN" 
        doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"/>

    <xsl:strip-space elements="*"/>

    <xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="x:body">
        <xsl:copy>
            <h1>Foo Title</h1>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

Notice:

  • declaration of the namespace prefix xmlns:x="http://www.w3.org/1999/xhtml" allows you to correctly select the elements in the input document which are qualified in the xhtml namespace.
  • declaration of default namespace xmlns="http://www.w3.org/1999/html" prevents the generation of unwanted empty namespaces xmlns="" in the output document.
  • the usage of exclude-result-prefixes allows you to exclude the declaration of the xhtml namespace in the output document elements explicitely declared in the XSLT.

From the point of view of XSLT 2.0, it's really much simple. You can declare the XPath default namespace, thus getting rid of the prefixes. The stylesheet declaration will be:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/1999/xhtml"
    xpath-default-namespace="http://www.w3.org/1999/xhtml">
查看更多
来,给爷笑一个
3楼-- · 2019-07-30 01:58

welcome to Stackoverflow!

You may in in one of two situations:

  • Your HTML file is in fact an XHTML file - in this case, nothing changes! HTML is simply a particular type of XML, and you can use all the normal techniques for processing it. There's nothing special about HTML input from the perspective of XSLT: learn XSLT and you can apply that to HTML just fine (of course, feel free to ask specific questions here!)
  • Your HTML file is not XHTML, and cannot be parsed by an xml parser. In this case, you'll need to convert the syntax to XML, or use a parser the represents the HTML as an XML tree. HTML Tidy can convert HTML to XHTML (and there are many flavors of it), and for example HTML Agility Pack can parse HTML and represent it as XML (note that HTML agility pack doesn't support xml namespaces, so if you have any of those in your input, you'll need to remove them first).
查看更多
登录 后发表回答