Change the encoding to UTF-8 on a stream (MemoryMa

2019-05-17 10:16发布

问题:

I am using the code below to read a ~2.5Gb Xml file as fast as I can (thanks to MemoryMappedFile). However, I am getting the following exception: "'.', hexadecimal value 0x00, is an invalid character. Line 9778, position 73249406.". I beleive it is due to some encoding problem. How do I make sure that the MemoryMappedViewStream reads the file using UTF-8?

static void Main(string[] args)
{
    using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open, "MyMemMapFile"))
    {
        using (MemoryMappedViewStream stream = file.CreateViewStream())
        {
            Read(stream);
        }
    }
}

static void Read(Stream stream)
{
    using (XmlReader reader = XmlReader.Create(stream))
    {
        reader.MoveToContent();

        while (reader.Read())
        {
        }
     }
 }

回答1:

You could use the StreamReader class to set the encoding:

static void Main(string[] args)
{
  using (var file = MemoryMappedFile.CreateFromFile(@"d:\temp\temp.xml", FileMode.Open,  "MyMemMapFile"))
  {
     using (MemoryMappedViewStream stream = file.CreateViewStream())
    {
        Read(stream);
    }
   }
}

static void Read(Stream stream)
{
  using (XmlReader reader = XmlReader.Create(new StreamReader(stream, Encoding.UTF8)))
  {
     reader.MoveToContent();

    while (reader.Read())
    {
    }
 }
}

Hope, this helps.



回答2:

On MSDN you get the following.

"The XmlReader scans the first bytes of the stream looking for a byte order mark or other sign of encoding"

Does your xml file specify an encoding?

<?xml version="1.0" encoding="UTF-8"?>