' ', hexadecimal value 0x1F, is an invalid

2020-04-02 06:19发布

问题:

I am trying to read a xml file from the web and parse it out using XDocument. It normally works fine but sometimes it gives me this error for day:

 **' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1**

I have tried some solutions from Google but they aren't working for VS 2010 Express Windows Phone 7.

There is a solution which replace the 0x1F character to string.empty but my code return a stream which doesn't have replace method.

s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);

Here is my code:

        void webClient_OpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
    {
        using (var reader = new StreamReader(e.Result))
        {
            int[] counter = { 1 };  
            string s = reader.ReadToEnd();
            Stream str = e.Result;
       //     s = s.Replace(Convert.ToString((byte)0x1F), string.Empty);
    //        byte[] str = Convert.FromBase64String(s);
     //       Stream memStream = new MemoryStream(str);
            str.Position = 0;
            XDocument xdoc = XDocument.Load(str);                

            var data = from query in xdoc.Descendants("user")
                       select new mobion
                       {
                           index = counter[0]++,
                           avlink = (string)query.Element("user_info").Element("avlink"),
                           nickname = (string)query.Element("user_info").Element("nickname"),
                           track = (string)query.Element("track"),
                           artist = (string)query.Element("artist"),
                       };
            listBox.ItemsSource = data;
        }
    }

XML file: http://music.mobion.vn/api/v1/music/userstop?devid=

回答1:

Consider using System.Web.HttpUtility.HtmlDecode if you're decoding content read from the web.



回答2:

0x1f is a Windows control character. It is not valid XML. Your best bet is to replace it.

Instead of using reader.ReadToEnd() (which by the way - for a large file - can use up a lot of memory.. though you can definitely use it) why not try something like:

string input;
while ((input = sr.ReadLine()) != null)
{
    string = string + input.Replace((char)(0x1F), ' ');
}

you can re-convert into a stream if you'd like, to then use as you please.

byte[] byteArray = Encoding.ASCII.GetBytes( input );
MemoryStream stream = new MemoryStream( byteArray );

Or else you could keep doing readToEnd() and then clean that string of illegal characters, and convert back to a stream.

Here's a good resource for cleaning illegal characters in your xml - chances are, youll have others as well...

https://seattlesoftware.wordpress.com/tag/hexadecimal-value-0x-is-an-invalid-character/



回答3:

What could be happening is that the content is compressed in which case you need to decompress it.

With HttpHandler you can do this the following way:

var client = new HttpClient(new HttpClientHandler
{
    AutomaticDecompression = DecompressionMethods.GZip
                             | DecompressionMethods.Deflate
});

With the "old" WebClient you have to derive your own class to achieve the similar effect:

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

Above taken from here

To use the two you would do something like this:

HttpClient

using (var client = new HttpClient(new HttpClientHandler { AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate }))
{
    using (var stream = client.GetStreamAsync(url))
    {
        using (var sr = new StreamReader(stream.Result))
        {
            using (var reader = XmlReader.Create(sr))
            {
                var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
                foreach (var item in feed.Items)
                {
                    Console.WriteLine(item.Title.Text);
                }   
            }
        }
    }
}

WebClient

using (var stream = new MyWebClient().OpenRead("http://myrss.url"))
{
    using (var sr = new StreamReader(stream))
    {
        using (var reader = XmlReader.Create(sr))
        {
            var feed = System.ServiceModel.Syndication.SyndicationFeed.Load(reader);
            foreach (var item in feed.Items)
            {
                Console.WriteLine(item.Title.Text);
            }
        }
    }
}

This way you also recieve the benefit of not having to .ReadToEnd() since you are working with the stream instead.



回答4:

If you are having issues replacing the character

For me there were some issues if you try to replace using the string instead of the char. I suggest trying some testing values using both to see what they turn up. Also how you reference it has some effect.

var a = x.IndexOf('\u001f');                      // 513
var b = x.IndexOf(Convert.ToString((byte)0x1F));  // -1
x = x.Replace(Convert.ToChar((byte)0x1F), ' ');   // Works
x = x.Replace(Convert.ToString((byte)0x1F), " "); // Fails

I blagged this



回答5:

I had the same issue and found that the problem was a  embedded in the xml. The solution was:

s = s.Replace("", " ")


回答6:

I'd guess it's probably an encoding issue but without seeing the XML I can't say for sure.

In terms of your plan to simply replace the character but not being able to, because you have a stream rather than a text, simply read the stream into a string and then remove the characters you don't want.



回答7:

Works for me.........

string.Replace(Chr(31), "")


回答8:

I used XmlSerializer to parse XML and faced the same exception. The problem is that the XML string contains HTML codes of invalid characters

This method removes all invalid HTML codes from string (based on this thread - https://forums.asp.net/t/1483793.aspx?Need+a+method+that+removes+illegal+XML+characters+from+a+String):

    public static string RemoveInvalidXmlSubstrs(string xmlStr)
    {
        string pattern = "&#((\\d+)|(x\\S+));";
        Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
        if (regex.IsMatch(xmlStr))
        {
            xmlStr = regex.Replace(xmlStr, new MatchEvaluator(m =>
            {
                string s = m.Value;
                string unicodeNumStr = s.Substring(2, s.Length - 3);

                int unicodeNum = unicodeNumStr.StartsWith("x") ?
                Convert.ToInt32(unicodeNumStr.Substring(1), 16)
                : Convert.ToInt32(unicodeNumStr);

                //according to https://www.w3.org/TR/xml/#charsets
                if ((unicodeNum == 0x9 || unicodeNum == 0xA || unicodeNum == 0xD) ||
                ((unicodeNum >= 0x20) && (unicodeNum <= 0xD7FF)) ||
                ((unicodeNum >= 0xE000) && (unicodeNum <= 0xFFFD)) ||
                ((unicodeNum >= 0x10000) && (unicodeNum <= 0x10FFFF)))
                {
                    return s;
                }
                else
                {
                    return String.Empty;
                }
            })
            );
        }
        return xmlStr;
    }


回答9:

Nobody can answer if you don't show relevant info - I mean the Xml content.

As a general advice I would put a breakpoint after ReadToEnd() call. Now you can do a couple of things:

  • Reveal Xml content to this forum.
  • Test it using VS Xml visualizer.
  • Copy-paste the string into a txt file and investigate it offline.