Obtaining the XML encoding from an XML declaration

2019-02-18 05:54发布

I'm working on some code to read an XML fragment which contains an XML declaration, e.g. <?xml version="1.0" encoding="utf-8"?> and parse the encoding. From MSDN, I should be able to do it like this:

var nt = new NameTable();
var mgr = new XmlNamespaceManager(nt);
var context = new XmlParserContext(null, mgr, null, XmlSpace.None);

var reader = new System.Xml.XmlTextReader(@"<?xml version=""1.0"" encoding=""UTF-8""?>", 
    System.Xml.XmlNodeType.XmlDeclaration, context);

However, I'm getting a System.Xml.XmlException on the call to the System.Xml.XmlTextReader constructor with an error message:

XmlNodeType XmlDeclaration is not supported for partial content parsing.

I've googled this error in quotes -- exactly zero results found (edit: now there's one result: this post) -- and without quotes, which yields nothing useful. I've also looked at MSDN for the XmlNodeType, and it doesn't say anything about it not being supported.

What am I missing here? How can I get an XmlTextReader instance from an XML declaration fragment?

Note, my goal here is just to determine the encoding of a partially-built XML document where I'm making the assumption that it at least contains a declaration node; thus, I'm trying to get reader.Encoding. If there's another way to do that, I'm open to that.

At present, I'm parsing the declaration manually using regex, which is not the best approach.

4条回答
淡お忘
2楼-- · 2019-02-18 06:18

Maybe late but you can use below code after loading it in a XmlDocument

    static string getEncoding(XmlDocument xml)
    {
        if (xml.FirstChild.NodeType == XmlNodeType.XmlDeclaration)
        {
            return (xml.FirstChild as XmlDeclaration).Encoding;
        }
        return "utf-8";
    }
查看更多
ら.Afraid
3楼-- · 2019-02-18 06:25

If you have a byte array as input, try something like this:

private Encoding getEncoding(byte[] data)
        {
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.DtdProcessing = DtdProcessing.Ignore;
            XmlDocument doc = new XmlDocument();
            MemoryStream ms = new MemoryStream(data);
            XmlReader reader = XmlReader.Create(ms, settings);
            doc.Load(reader);
            XmlDeclaration declaration = doc.ChildNodes.OfType<XmlDeclaration>().FirstOrDefault();
            return Encoding.GetEncoding(declaration.Encoding);
        }
查看更多
孤傲高冷的网名
4楼-- · 2019-02-18 06:26

Update: Getting the encoding from XML documentation or from XML fragment:

Here's a way to get the encoding without having to resort to fake root, using XmlReader.Create.

private static string GetXmlEncoding(string xmlString)
{
    if (string.IsNullOrWhiteSpace(xmlString)) throw new ArgumentException("The provided string value is null or empty.");

    using (var stringReader = new StringReader(xmlString))
    {
        var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };

        using (var xmlReader = XmlReader.Create(stringReader, settings))
        {
            if (!xmlReader.Read()) throw new ArgumentException(
                "The provided XML string does not contain enough data to be valid XML (see https://msdn.microsoft.com/en-us/library/system.xml.xmlreader.read)");

            var result = xmlReader.GetAttribute("encoding");
            return result;
        }
    }
}

Here's the output, with a full and fragment XML:

XML encoding ith XmlReader.Create

If you want to have System.Text.Encoding, you can modify the code to look like this:

    private static Encoding GetXmlEncoding(string xmlString)
    {
        using (StringReader stringReader = new StringReader(xmlString))
        {
            var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };

            var reader = XmlReader.Create(stringReader, settings);
            reader.Read();

            var encoding = reader.GetAttribute("encoding");

            var result = Encoding.GetEncoding(encoding);
            return result;
        }
    }

Old answer:

As you mentioned, XmlTextReader's Encoding-property contains the encoding.

Here's a full Console app's source code which hopefully is useful:

class Program
{
    static void Main(string[] args)
    {
        var asciiXML = @"<?xml version=""1.0"" encoding=""ASCII""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";
        var utf8XML = @"<?xml version=""1.0"" encoding=""UTF-8""?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>";

        var asciiResult = GetXmlEncoding(asciiXML);
        var utfResult = GetXmlEncoding(utf8XML);

        Console.WriteLine(asciiResult);
        Console.WriteLine(utfResult);

        Console.ReadLine();
    }
    private static Encoding GetXmlEncoding(string s)
    {
        var stream = new MemoryStream(Encoding.UTF8.GetBytes(s));

        using (var xmlreader = new XmlTextReader(stream))
        {
            xmlreader.MoveToContent();
            var encoding = xmlreader.Encoding;

            return encoding;
        }
    }
}

Here's the output from the program:

XML Encoding output

If you know that the XML only contains the declaration, maybe you can add an empty root? So for example:

        var fragmentResult = GetXmlEncoding(xmlFragment + "<root/>");

XML Fragment

查看更多
老娘就宠你
5楼-- · 2019-02-18 06:26

Good evening, here's the solution with a System.Text.Encoding as output. I made it to be clear, and step by step.

class Program
{
    static void Main(string[] args)
    {
        var line = File.ReadLines(YourFileName).First();
        var correctXml = line + "<Root></Root>";
        var xml = XDocument.Parse(correctXml);
        var stringEncoding = xml.Declaration.Encoding;
        var encoding = System.Text.Encoding.GetEncoding(stringEncoding);
    }
}
查看更多
登录 后发表回答