We are getting an XML document from a vendor that we need to perform an XSL transform on using their stylesheet so that we can convert the resulting HTML to a PDF. The actual stylesheet is referenced in an href
attribute of the ?xml-stylesheet
definition in the XML document. Is there any way that I can get that URL out using C#? I don't trust the vendor not to change the URL and obviously don't want to hardcode it.
The start of the XML file with the full ?xml-stylesheet
element looks like this:
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.fakeurl.com/StyleSheet.xsl"?>
Linq to xml code:
XDocument xDoc = ...;
var cssUrlQuery = from node in xDoc.Nodes()
where node.NodeType == XmlNodeType.ProcessingInstruction
select Regex.Match(((XProcessingInstruction)node).Data, "href=\"(?<url>.*?)\"").Groups["url"].Value;
or linq to objects
var cssUrls = (from XmlNode childNode in doc.ChildNodes
where childNode.NodeType == XmlNodeType.ProcessingInstruction && childNode.Name == "xml-stylesheet"
select (XmlProcessingInstruction) childNode
into procNode select Regex.Match(procNode.Data, "href=\"(?<url>.*?)\"").Groups["url"].Value).ToList();
xDoc.XPathSelectElement() will not work since it for some reasone cannot cast an XElement to XProcessingInstruction.
As a processing instruction can have any contents it formally does not have any attributes. But if you know there are "pseudo" attributes, like in the case of an xml-stylesheet processing instruction, then you can of course use the value of the processing instruction to construct the markup of a single element and parse that with the XML parser:
XmlDocument doc = new XmlDocument();
doc.Load(@"file.xml");
XmlNode pi = doc.SelectSingleNode("processing-instruction('xml-stylesheet')");
if (pi != null)
{
XmlElement piEl = (XmlElement)doc.ReadNode(XmlReader.Create(new StringReader("<pi " + pi.Value + "/>")));
string href = piEl.GetAttribute("href");
Console.WriteLine(href);
}
else
{
Console.WriteLine("No pi found.");
}
You can also use XPath. Given an XmlDocument loaded with your source:
XmlProcessingInstruction instruction = doc.SelectSingleNode("//processing-instruction(\"xml-stylesheet\")") as XmlProcessingInstruction;
if (instruction != null) {
Console.WriteLine(instruction.InnerText);
}
Then just parse InnerText with Regex.
To find the value using a proper XML parser you could write something like this:
using(var xr = XmlReader.Create(input))
{
while(xr.Read())
{
if(xr.NodeType == XmlNodeType.ProcessingInstruction && xr.Name == "xml-stylesheet")
{
string s = xr.Value;
int i = s.IndexOf("href=\"") + 6;
s = s.Substring(i, s.IndexOf('\"', i) - i);
Console.WriteLine(s);
break;
}
}
}
private string _GetTemplateUrl(XDocument formXmlData)
{
var infopathInstruction = (XProcessingInstruction)formXmlData.Nodes().First(node => node.NodeType == XmlNodeType.ProcessingInstruction && ((XProcessingInstruction)node).Target == "mso-infoPathSolution");
var instructionValueAsDoc = XDocument.Parse("<n " + infopathInstruction.Data + " />");
return instructionValueAsDoc.Root.Attribute("href").Value;
}
XmlProcessingInstruction stylesheet = doc.SelectSingleNode("processing-instruction('xml-stylesheet')") as XmlProcessingInstruction;