C# XML Serialization - Leading Question Marks

2019-03-19 19:10发布

Problem

By leveraging some samples I found online here, I've written some XML serialization methods.

  • Method1: Serialize an Object and return: (a) the type, (b) the xml string
  • Method2: Takes (a) and (b) above and gives you back the Object.

I noticed that the xml string from the Method1 contains a leading '?'. This seems to be fine when using Method2 to reconstruct the Object.

But when doing some testing in the application, sometimes we got leading '???' instead. This caused the Method2 to throw an exception while trying to reconstruct the Object. The 'Object' in this case was just a simple int.

System.InvalidOperationException was unhandled Message="There is an error in XML document (1, 1)." Source="System.Xml" StackTrace: at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events) at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle) at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream) at XMLSerialization.Program.DeserializeXmlStringToObject(String xmlString, String objectType) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 96 at XMLSerialization.Program.Main(String[] args) in C:\Documents and Settings\...Projects\XMLSerialization\Program.cs:line 49

Would anyone be able to shed some light on what might be causing this?

Sample Code

Here's sample code from the mini-tester I wrote while coding this up which runs as a VS console app. It'll show you the XML string. You can also uncomment the regions to append the extra leading '??' to reproduce the exception.



using System;
using System.IO;
using System.Text;
using System.Xml;
using System.Xml.Serialization;

namespace XMLSerialization
{
    class Program
    {
        static void Main(string[] args)
        {
            // deserialize to string
            #region int
            object inObj = 5;
            #endregion

            #region string
            //object inObj = "Testing123";
            #endregion

            #region list
            //List inObj = new List();
            //inObj.Add("0:25");
            //inObj.Add("1:26");
            #endregion

            string[] stringArray = SerializeObjectToXmlString(inObj);

            #region include leading ???
            //int indexOfBracket = stringArray[0].IndexOf('<');
            //stringArray[0] = "??" + stringArray[0];
            #endregion

            #region strip out leading ???
            //int indexOfBracket = stringArray[0].IndexOf('<');
            //string trimmedString = stringArray[0].Substring(indexOfBracket);
            //stringArray[0] = trimmedString;
            #endregion

            Console.WriteLine("Input");
            Console.WriteLine("-----");
            Console.WriteLine("Object Type: " + stringArray[1]);
            Console.WriteLine();
            Console.WriteLine("XML String: " + Environment.NewLine + stringArray[0]);
            Console.WriteLine(String.Empty);

             // serialize back to object
            object outObj = DeserializeXmlStringToObject(stringArray[0], stringArray[1]);

            Console.WriteLine("Output");
            Console.WriteLine("------");

            #region int
            Console.WriteLine("Object: " + (int)outObj);
            #endregion

            #region string
            //Console.WriteLine("Object: " + (string)outObj);
            #endregion

            #region list
            //string[] tempArray;
            //List list = (List)outObj;

            //foreach (string pair in list)
            //{
            //    tempArray = pair.Split(':');
            //    Console.WriteLine(String.Format("Key:{0} Value:{1}", tempArray[0], tempArray[1]));
            //}
            #endregion

            Console.Read();
        }

        private static string[] SerializeObjectToXmlString(object obj)
        {
            XmlTextWriter writer = new XmlTextWriter(new MemoryStream(), Encoding.UTF8);
            writer.Formatting = Formatting.Indented;
            XmlSerializer serializer = new XmlSerializer(obj.GetType());
            serializer.Serialize(writer, obj);

            MemoryStream stream = (MemoryStream)writer.BaseStream;
            string xmlString = UTF8ByteArrayToString(stream.ToArray());

            string objectType = obj.GetType().FullName;

            return new string[]{xmlString, objectType};
        }

        private static object DeserializeXmlStringToObject(string xmlString, string objectType)
        {
            MemoryStream stream = new MemoryStream(StringToUTF8ByteArray(xmlString));
            XmlSerializer serializer = new XmlSerializer(Type.GetType(objectType));

            object obj = serializer.Deserialize(stream);

            return obj;
        }

        private static string UTF8ByteArrayToString(Byte[] characters)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            return encoding.GetString(characters);
        }

        private static byte[] StringToUTF8ByteArray(String pXmlString)
        {
            UTF8Encoding encoding = new UTF8Encoding();
            return encoding.GetBytes(pXmlString);
        } 


    }
}


2条回答
冷血范
2楼-- · 2019-03-19 19:36

This is BOM symbol. You can either remove it

if (xmlString.Length > 0 && xmlString[0] != '<')
{
    xmlString = xmlString.Substring(1, xmlString.Length - 1);
}

Or use UTF32 to serialize

using (StringWriter writer = new StringWriter(CultureInfo.InvariantCulture))
{
    serializer.Serialize(writer, instance);
    result = writer.ToString();
}

And deserialize

object result;
using (StringReader reader = new StringReader(instance))
{
    result = serializer.Deserialize(reader);
}

If you are using this code only inside .Net applications using UTF32 won't create problems as it's the default encoding for everything inside .Net

查看更多
迷人小祖宗
3楼-- · 2019-03-19 19:42

When I've come across this before, it usually had to do with encoding. I'd try specifying the encoding when you serialize your object. Try using the following code. Also, is there any specific reason why you need to return a string[] array? I've changed your methods to use generics so you don't have to specify a type.

private static string SerializeObjectToXmlString<T>(T obj)
{
    XmlSerializer xmls = new XmlSerializer(typeof(T));
    using (MemoryStream ms = new MemoryStream())
    {
        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Encoding = Encoding.UTF8;
        settings.Indent = true;
        settings.IndentChars = "\t";
        settings.NewLineChars = Environment.NewLine;
        settings.ConformanceLevel = ConformanceLevel.Document;

        using (XmlWriter writer = XmlTextWriter.Create(ms, settings))
        {
            xmls.Serialize(writer, obj);
        }

        string xml = Encoding.UTF8.GetString(ms.ToArray());
        return xml;
    }
}

private static T DeserializeXmlStringToObject <T>(string xmlString)
{
    XmlSerializer xmls = new XmlSerializer(typeof(T));

    using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(xmlString)))
    {
        return (T)xmls.Deserialize(ms);
    }
}

If you still have problems, try using Encoding.ASCII in your code anywhere you see Encoding.UTF8, unless you have a specific reason for using UTF8. I'm not sure of the cause, but I've seen UTF8 encoding cause this exact problem in certain cases when serializing.

查看更多
登录 后发表回答