How can I get the href attribute value out of an &

2019-02-21 15:03发布

问题:

We are getting an XML document from a vendor that we need to perform an XSL transform on using their stylesheet so that we can convert the resulting HTML to a PDF. The actual stylesheet is referenced in an href attribute of the ?xml-stylesheet definition in the XML document. Is there any way that I can get that URL out using C#? I don't trust the vendor not to change the URL and obviously don't want to hardcode it.

The start of the XML file with the full ?xml-stylesheet element looks like this:

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.fakeurl.com/StyleSheet.xsl"?>

回答1:

Linq to xml code:

XDocument xDoc = ...;

var cssUrlQuery = from node in xDoc.Nodes()
        where node.NodeType == XmlNodeType.ProcessingInstruction
        select Regex.Match(((XProcessingInstruction)node).Data, "href=\"(?<url>.*?)\"").Groups["url"].Value;

or linq to objects

var cssUrls = (from XmlNode childNode in doc.ChildNodes
                   where childNode.NodeType == XmlNodeType.ProcessingInstruction && childNode.Name == "xml-stylesheet"
                   select (XmlProcessingInstruction) childNode
                   into procNode select Regex.Match(procNode.Data, "href=\"(?<url>.*?)\"").Groups["url"].Value).ToList();

xDoc.XPathSelectElement() will not work since it for some reasone cannot cast an XElement to XProcessingInstruction.



回答2:

As a processing instruction can have any contents it formally does not have any attributes. But if you know there are "pseudo" attributes, like in the case of an xml-stylesheet processing instruction, then you can of course use the value of the processing instruction to construct the markup of a single element and parse that with the XML parser:

    XmlDocument doc = new XmlDocument();
    doc.Load(@"file.xml");
    XmlNode pi = doc.SelectSingleNode("processing-instruction('xml-stylesheet')");
    if (pi != null)
    {
        XmlElement piEl = (XmlElement)doc.ReadNode(XmlReader.Create(new StringReader("<pi " + pi.Value + "/>")));
        string href = piEl.GetAttribute("href");
        Console.WriteLine(href);
    }
    else
    {
        Console.WriteLine("No pi found.");
    }


回答3:

You can also use XPath. Given an XmlDocument loaded with your source:

XmlProcessingInstruction instruction = doc.SelectSingleNode("//processing-instruction(\"xml-stylesheet\")") as XmlProcessingInstruction;
if (instruction != null) {
    Console.WriteLine(instruction.InnerText);
}

Then just parse InnerText with Regex.



回答4:

To find the value using a proper XML parser you could write something like this:


using(var xr = XmlReader.Create(input))
{
    while(xr.Read())
    {
        if(xr.NodeType == XmlNodeType.ProcessingInstruction && xr.Name == "xml-stylesheet")
        {
            string s = xr.Value;
            int i = s.IndexOf("href=\"") + 6;
            s = s.Substring(i, s.IndexOf('\"', i) - i);
            Console.WriteLine(s);
            break;
        }
    }
}


回答5:

private string _GetTemplateUrl(XDocument formXmlData) 
{
    var infopathInstruction = (XProcessingInstruction)formXmlData.Nodes().First(node => node.NodeType == XmlNodeType.ProcessingInstruction && ((XProcessingInstruction)node).Target == "mso-infoPathSolution");
    var instructionValueAsDoc = XDocument.Parse("<n " + infopathInstruction.Data + " />");
    return instructionValueAsDoc.Root.Attribute("href").Value;
}


回答6:

XmlProcessingInstruction stylesheet = doc.SelectSingleNode("processing-instruction('xml-stylesheet')") as XmlProcessingInstruction;



标签: c# xml xslt