I am looking for fast way to deserialize xml, that has special characters in it like ö.
I was using XMLReader and it fails to deserialze such characters.
Any suggestion?
EDIT: I am using C#.
Code is as follows:
XElement element =.. //has the xml
XmlSerializer serializer = new XmlSerializer(typeof(MyType));
XmlReader reader = element.CreateReader();
Object o= serializer.Deserialize(reader);
I'd guess you're having an encoding issue, not in the XMLReader
but with the XmlSerializer
.
You could use the XmlTextWriter
and UTF8 encoding with the XmlSerializer
like in the following snippet (see the generic methods below for a way nicer implementation of it). Works just fine with umlauts (äöü) and other special characters.
class Program
{
static void Main(string[] args)
{
SpecialCharacters specialCharacters = new SpecialCharacters { Umlaute = "äüö" };
// serialize object to xml
MemoryStream memoryStreamSerialize = new MemoryStream();
XmlSerializer xmlSerializerSerialize = new XmlSerializer(typeof(SpecialCharacters));
XmlTextWriter xmlTextWriterSerialize = new XmlTextWriter(memoryStreamSerialize, Encoding.UTF8);
xmlSerializerSerialize.Serialize(xmlTextWriterSerialize, specialCharacters);
memoryStreamSerialize = (MemoryStream)xmlTextWriterSerialize.BaseStream;
// converts a byte array of unicode values (UTF-8 enabled) to a string
UTF8Encoding encodingSerialize = new UTF8Encoding();
string serializedXml = encodingSerialize.GetString(memoryStreamSerialize.ToArray());
xmlTextWriterSerialize.Close();
memoryStreamSerialize.Close();
memoryStreamSerialize.Dispose();
// deserialize xml to object
// converts a string to a UTF-8 byte array.
UTF8Encoding encodingDeserialize = new UTF8Encoding();
byte[] byteArray = encodingDeserialize.GetBytes(serializedXml);
using (MemoryStream memoryStreamDeserialize = new MemoryStream(byteArray))
{
XmlSerializer xmlSerializerDeserialize = new XmlSerializer(typeof(SpecialCharacters));
XmlTextWriter xmlTextWriterDeserialize = new XmlTextWriter(memoryStreamDeserialize, Encoding.UTF8);
SpecialCharacters deserializedObject = (SpecialCharacters)xmlSerializerDeserialize.Deserialize(xmlTextWriterDeserialize.BaseStream);
}
}
}
[Serializable]
public class SpecialCharacters
{
public string Umlaute { get; set; }
}
I personally use the follwing generic methods to serialize and deserialize XML and objects and haven't had any performance or encoding issues yet.
public static string SerializeObjectToXml<T>(T obj)
{
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xmlSerializer.Serialize(xmlTextWriter, obj);
memoryStream = (MemoryStream)xmlTextWriter.BaseStream;
string xmlString = ByteArrayToStringUtf8(memoryStream.ToArray());
xmlTextWriter.Close();
memoryStream.Close();
memoryStream.Dispose();
return xmlString;
}
public static T DeserializeXmlToObject<T>(string xml)
{
using (MemoryStream memoryStream = new MemoryStream(StringToByteArrayUtf8(xml)))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
using (StreamReader xmlStreamReader = new StreamReader(memoryStream, Encoding.UTF8))
{
return (T)xmlSerializer.Deserialize(xmlStreamReader);
}
}
}
public static string ByteArrayToStringUtf8(byte[] value)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetString(value);
}
public static byte[] StringToByteArrayUtf8(string value)
{
UTF8Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(value);
}
What works for me is similar to what @martin-buberl suggested:
public static T DeserializeXmlToObject<T>(string xml)
{
using (MemoryStream memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(xml)))
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
StreamReader reader = new StreamReader(memoryStream, Encoding.UTF8);
return (T)xmlSerializer.Deserialize(reader);
}
}
[XmlElement(ElementName = "Profiles")]
//public ProfilesType[] Profiles { get; set; }
public Profiles Profiles { get; set; }
Tried something above?
I haven't checked, but this sprang to mind. I managed to de+serialize Data that has åäö etc.
U are not talking about tagnames?