Lowercase RSS feeds tag names and attributes with

2019-09-13 08:20发布

问题:

I have been tasked with processing several RSS feed which are badly managed by a 3rd party.

My issue is their capitalisation is very unreliable - for instance, some feeds they use the correct element tags <rss>, <item> and <enclosure url="example.mp3"> etc, but other times they use incorrect <RSS>, <Item> and <Enclosure URL="example.mp3"> case.

Needless to say, this makes reading the XML (with PHP5 DOMDocument) very tricky.

I found a rather nice XSLT stylesheet here (by the clearly very talented michael.hor257k) which can successfully fix my capitalisation issue: "How to convert tags in all tags in xml to lowercase without changing case of atribute values?"

HOWEVER that XSLT stylesheet, while it works to lowercase all the elements and attributes.... it removes all the RSS root level namespace attributes!

For instance, the below RSS XML:

<?xml version="1.0" encoding="UTF-8"?>
<RSS xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0"     EXAMPLEATTRIBUTE="example">
<channel>
...

When run via this XSLT

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />

<xsl:template match="*">
    <xsl:element name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
        <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

<xsl:template match="comment() | text() | processing-instruction()">
    <xsl:copy/>
</xsl:template>

</xsl:stylesheet>

Will give the following results

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" exampleattribute="example">
<channel>
...

When what I really need is

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0"     exampleattribute ="example">
<channel>
...

I appreciate this is a very niche issue, but it has been stumping me for many hours now, infact, pretty much my entire day.

TL/DR: Does anyone please know how to...

a) Lowercase all attributes and element tags inside RSS XML using XSLT 1.0

b) Retain all RSS root level namespaces when doing this (so the 'atom' and 'iTunes' namespaces remain)

I would be incredibly appreciative, thank you very kindly,

---- Edit: Extra notes, as requested by michael.hor257k in comments ----

Input (notice the capitalised <RSS></RSS>)

<?xml version="1.0" encoding="UTF-8"?><RSS xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
  <CHANNEL>
    <atom:link href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss" rel="self" type="application/rss+xml"/>
    <atom:link href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss?before=291065752" rel="next" type="application/rss+xml"/>
    <title>FASNASTIC: The Everything FASNASTIC Feed</title>
    <link>http://fasnastic.com</link>
    <pubDate>Thu, 03 Nov 2016 17:21:11 +0000</pubDate>
    <lastBuildDate>Thu, 03 Nov 2016 17:21:11 +0000</lastBuildDate>
    <ttl>60</ttl>
    <language>en</language>
    <copyright>All rights reserved</copyright>
    <webMaster>feeds@soundcloud.com (SoundCloud Feeds)</webMaster>
    <description>FASNASTIC LTD. is a UK things brand who make games and podcasts and things.

Schedule....

THURSDAY: Game Fart Podcast - The FASNASTIC Farts &amp; Video Games Podcast.
Farting out video game news for your pleasure. Please do not listen to this podcast if you treat video games even remotely seriously...

FRIDAY: The Creepy Midnight Podcast - Do you like creepy shit? We do. This FASNASTIC LTD. podcast is designed to be listened to alone, at midnight. Conspiracy theories, science, diseases, technology and aliens.

Download our apps and games on Android &amp; iOS.
Find out more at http://fasnastic.com/</description>
    <itunes:subtitle>FASNASTIC LTD. is a UK things brand who make game…</itunes:subtitle>
    <itunes:owner>
      <itunes:name>FASNASTIC</itunes:name>
      <itunes:email>fasnastic@gmail.com</itunes:email>
    </itunes:owner>
    <itunes:author>FASNASTIC LTD.</itunes:author>
    <itunes:explicit>yes</itunes:explicit>
    <itunes:image href="http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg"/>
    <IMAGE>
      <url>http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg</url>
      <title>FASNASTIC</title>
      <link>http://fasnastic.com</link>
    </IMAGE>
    <itunes:category text="Comedy"/>
    <ITEM>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/291065752</guid>
      <title>FAS - NAS - TIC</title>
      <pubDate>Wed, 02 Nov 2016 03:00:52 +0000</pubDate>
      <link>https://soundcloud.com/fasnastic/fas-nas-tic</link>
      <itunes:duration>00:00:01</itunes:duration>
      <itunes:author>FASNASTIC LTD.</itunes:author>
      <itunes:explicit>yes</itunes:explicit>
      <itunes:summary>The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</itunes:summary>
      <itunes:subtitle>The FASNASTIC LTD. Sonic Logo (created by Esa Juh…</itunes:subtitle>
      <description>The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</description>
      <enclosure type="audio/mpeg" url="http://www.podtrac.com/pts/redirect.mp3/feeds.soundcloud.com/stream/291065752-fasnastic-fas-nas-tic.mp3" length="31805"/>
      <itunes:image href="http://i1.sndcdn.com/artworks-000191668956-448wgw-original.jpg"/>
    </ITEM>
  </CHANNEL>
</RSS>

Output I am getting currently from above XSLT (notice the missing attributes inside <rss> -- the atom namespace and iTunes namespace is gone, and the top link tag has changed from <atom:link href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss?before=291065752" rel="next" type="application/rss+xml"/> to <link xmlns="http://www.w3.org/2005/Atom" href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss?before=291065752" rel="next" type="application/rss+xml"/>

<rss version="2.0">
  <channel>
    <link xmlns="http://www.w3.org/2005/Atom" href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss" rel="self" type="application/rss+xml"/>
    <link xmlns="http://www.w3.org/2005/Atom" href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss?before=291065752" rel="next" type="application/rss+xml"/>
    <title>FASNASTIC: The Everything FASNASTIC Feed</title>
    <link>http://fasnastic.com</link>
    <pubdate>Thu, 03 Nov 2016 17:21:11 +0000</pubdate>
    <lastbuilddate>Thu, 03 Nov 2016 17:21:11 +0000</lastbuilddate>
    <ttl>60</ttl>
    <language>en</language>
    <copyright>All rights reserved</copyright>
    <webmaster>feeds@soundcloud.com (SoundCloud Feeds)</webmaster>
    <description>FASNASTIC LTD. is a UK things brand who make games and podcasts and things.

Schedule....

THURSDAY: Game Fart Podcast - The FASNASTIC Farts &amp; Video Games Podcast.
Farting out video game news for your pleasure. Please do not listen to this podcast if you treat video games even remotely seriously...

FRIDAY: The Creepy Midnight Podcast - Do you like creepy shit? We do. This FASNASTIC LTD. podcast is designed to be listened to alone, at midnight. Conspiracy theories, science, diseases, technology and aliens.

Download our apps and games on Android &amp; iOS.
Find out more at http://fasnastic.com/</description>
    <subtitle xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">FASNASTIC LTD. is a UK things brand who make game…</subtitle>
    <owner xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">
      <name>FASNASTIC</name>
      <email>fasnastic@gmail.com</email>
    </owner>
    <author xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">FASNASTIC LTD.</author>
    <explicit xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">yes</explicit>
    <image xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd" href="http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg"/>
    <image>
      <url>http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg</url>
      <title>FASNASTIC</title>
      <link>http://fasnastic.com</link>
    </image>
    <category xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd" text="Comedy"/>
    <link/>
    <item xmlns:default="http://www.itunes.com/dtds/podcast-1.0.dtd">
      <guid ispermalink="false">tag:soundcloud,2010:tracks/291065752</guid>
      <title>FAS - NAS - TIC</title>
      <pubdate>Wed, 02 Nov 2016 03:00:52 +0000</pubdate>
      <link>https://soundcloud.com/fasnastic/fas-nas-tic</link>
      <default:duration xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">00:00:01</default:duration>
      <default:author xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">FASNASTIC LTD.</default:author>
      <default:explicit xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">yes</default:explicit>
      <default:summary xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</default:summary>
      <default:subtitle xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">The FASNASTIC LTD. Sonic Logo (created by Esa Juh…</default:subtitle>
      <description>The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</description>
      <enclosure type="audio/mpeg" url="http://www.podtrac.com/pts/redirect.mp3/feeds.soundcloud.com/stream/291065752-fasnastic-fas-nas-tic.mp3" length="31805"/>
      <default:image xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd" href="http://i1.sndcdn.com/artworks-000191668956-448wgw-original.jpg"/>
    </item>
  </channel>
</rss>

Desired output (exactly same as input, just with <rss> instead of <RSS> and <channel> instead of <CHANNEL> etc etc)

<?xml version="1.0" encoding="UTF-8"?><rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" version="2.0">
  <channel>
    <atom:link href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss" rel="self" type="application/rss+xml"/>
    <atom:link href="http://feeds.soundcloud.com/users/soundcloud:users:245142600/sounds.rss?before=291065752" rel="next" type="application/rss+xml"/>
    <title>FASNASTIC: The Everything FASNASTIC Feed</title>
    <link>http://fasnastic.com</link>
    <pubDate>Thu, 03 Nov 2016 17:21:11 +0000</pubDate>
    <lastBuildDate>Thu, 03 Nov 2016 17:21:11 +0000</lastBuildDate>
    <ttl>60</ttl>
    <language>en</language>
    <copyright>All rights reserved</copyright>
    <webMaster>feeds@soundcloud.com (SoundCloud Feeds)</webMaster>
    <description>FASNASTIC LTD. is a UK things brand who make games and podcasts and things.

Schedule....

THURSDAY: Game Fart Podcast - The FASNASTIC Farts &amp; Video Games Podcast.
Farting out video game news for your pleasure. Please do not listen to this podcast if you treat video games even remotely seriously...

FRIDAY: The Creepy Midnight Podcast - Do you like creepy shit? We do. This FASNASTIC LTD. podcast is designed to be listened to alone, at midnight. Conspiracy theories, science, diseases, technology and aliens.

Download our apps and games on Android &amp; iOS.
Find out more at http://fasnastic.com/</description>
    <itunes:subtitle>FASNASTIC LTD. is a UK things brand who make game…</itunes:subtitle>
    <itunes:owner>
      <itunes:name>FASNASTIC</itunes:name>
      <itunes:email>fasnastic@gmail.com</itunes:email>
    </itunes:owner>
    <itunes:author>FASNASTIC LTD.</itunes:author>
    <itunes:explicit>yes</itunes:explicit>
    <itunes:image href="http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg"/>
    <image>
      <url>http://i1.sndcdn.com/avatars-000274631751-z80y47-original.jpg</url>
      <title>FASNASTIC</title>
      <link>http://fasnastic.com</link>
    </image>
    <itunes:category text="Comedy"/>
    <item>
      <guid isPermaLink="false">tag:soundcloud,2010:tracks/291065752</guid>
      <title>FAS - NAS - TIC</title>
      <pubDate>Wed, 02 Nov 2016 03:00:52 +0000</pubDate>
      <link>https://soundcloud.com/fasnastic/fas-nas-tic</link>
      <itunes:duration>00:00:01</itunes:duration>
      <itunes:author>FASNASTIC LTD.</itunes:author>
      <itunes:explicit>yes</itunes:explicit>
      <itunes:summary>The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</itunes:summary>
      <itunes:subtitle>The FASNASTIC LTD. Sonic Logo (created by Esa Juh…</itunes:subtitle>
      <description>The FASNASTIC LTD. Sonic Logo (created by Esa Juhani Ruoho)</description>
      <enclosure type="audio/mpeg" url="http://www.podtrac.com/pts/redirect.mp3/feeds.soundcloud.com/stream/291065752-fasnastic-fas-nas-tic.mp3" length="31805"/>
      <itunes:image href="http://i1.sndcdn.com/artworks-000191668956-448wgw-original.jpg"/>
    </item>
  </channel>
</rss>

This is being run with below PHP script

$xmlLoaded = new DOMDocument;
$xmlLoaded->preserveWhiteSpace = FALSE;
$xmlLoaded->loadXML($stringContainingOriginalXML);
//Load the style sheet inside an object
$xslStylesheetObject= new DOMDocument();
//Create a processor
$xslStylesheetProcessor = new XSLTProcessor();
$xslStylesheetProcessor->registerPHPFunctions();
//Import stylesheet into processor
$xslStylesheetProcessor->importStylesheet($xslStylesheetObject);
//Reload new XML after processing
$xmlLoaded->loadXML($xslStylesheetProcessor->transformToXML($xmlLoaded));

回答1:

This is not a trivial task, and for a good reason. The reason is that semantically there is no difference between these three:

<rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd">
    <itunes:owner>

or:

<rss>
    <owner xmlns="http://www.itunes.com/dtds/podcast-1.0.dtd">

or:

<rss>
    <default:owner xmlns:default="http://www.itunes.com/dtds/podcast-1.0.dtd">

The XSLT language (and the XSLT 1.0 version in particular) provides no tools to control the syntactic form of the output; this is left for the processor to decide at will.

For the same reason, any measures you take to try enforce your desired syntax may work with one processor and not another.

That said, since you are presumably working with the libxslt processor, I believe all you need to do is add the following template:

<xsl:template match="RSS">
    <rss>
        <xsl:copy-of select="namespace::*"/>
        <xsl:apply-templates select="@*|node()"/>
    </rss>
</xsl:template>