Parsing XML with C#

2019-07-06 22:07发布

问题:

I have an XML file as follows:

I uploaded the XML file : http://dl.dropbox.com/u/10773282/2011/result.xml . It's a machine generated XML, so you might need some XML viewer/editor.

I use this C# code to get the elements in CoverageDSPriv/Module/*.

using System;
using System.Xml;
using System.Xml.Linq;

namespace HIR {
  class Dummy {

    static void Main(String[] argv) {

      XDocument doc = XDocument.Load("result.xml");

      var coveragePriv = doc.Descendants("CoverageDSPriv"); //.First();
      var cons = coveragePriv.Elements("Module");

      foreach (var con in cons)
      {
        var id = con.Value;
        Console.WriteLine(id);
      }
    }
  }
}

Running the code, I get this result.

hello.exe6144008016161810hello.exehello.exehello.exe81061hello.exehello.exe!17main_main40030170170010180180011190190012200200013hello.exe!107testfunctiontestfunction(int)40131505001460600158080216120120017140140018AA

I expect to get

hello.exe
61440
...

However, I get just one line of long string.

  • Q1 : What might be wrong?
  • Q2 : How to get the # of elements in cons? I tried cons.Count, but it doesn't work.
  • Q3 : If I need to get nested value of <CoverageDSPriv><Module><ModuleNmae> I use this code :

    var coveragePriv = doc.Descendants("CoverageDSPriv"); //.First(); var cons = coveragePriv.Elements("Module").Elements("ModuleName");

I can live with this, but if the elements are deeply nested, I might be wanting to have direct way to get the elements. Are there any other ways to do that?

ADDED

var cons = coveragePriv.Elements("Module").Elements();

solves this issue, but for the NamespaceTable, it again prints out all the elements in one line.

hello.exe
61440
0
8
0
1
6
1
61810hello.exehello.exehello.exe81061hello.exehello.exe!17main_main40030170170010180180011190190012200200013hello.exe!107testfunctiontestfunction(int)40131505001460600158080216120120017140140018

Or, Linq to XML can be a better solution, as this post.

回答1:

It looks to me like you only have one element named Module -- so .Value is simply returning you the InnerText of that entire element. Were you intending this instead?

coveragePriv.Element("Module").Elements();

This would return all the child elements of the Module element, which seems to be what your'e after.

Update:

<NamespaceTable> is a child of <Module> but you appear to want to handle it similarly to <Module> in that you want to write out each child element. Thus, one brute-force approach would be to add another loop for <NamespaceTable>:

foreach (var con in cons)
{
    if (con.Name == "NamespaceTable") 
    {
        foreach (var nsElement in con.Elements()) 
        {
            var nsId = nsElement.Value;
            Console.WriteLine(nsId);
        }
    }
    else
    {
        var id = con.Value;
        Console.WriteLine(id);
    }
}

Alternatively, perhaps you'd rather just denormalize them altogether via .Descendents():

var cons = coveragePriv.Element("Module").Descendents();

foreach (var con in cons)
{
    var id = con.Value;
    Console.WriteLine(id);
}


回答2:

XMLElement.Value has unexpected results. In XML using .net you are really in charge of manually traversing the xml tree. If the element is text then value may return what you want but if its another element then not so much.

I have done a lot of xml parsing and I find there are way better ways to handle XML depending on what you are doing with the data.

1) You can look into XSLT transforms if you plan on outputting this data as text, more xml, or html. This is a great way to convert the data to some other readable format. We use this when we want to display our metadata on our website in html.

2) Look into XML Serialization. C# makes this very easy and it is amazing to use because then you can work with a regular C# object when consuming the data. MS even has tools to create the serlization class from the XML. I usually start with that, clean it up and add my own tweaks to make it work as I wish. The best way is to deserialize the object to XML and see if that matches what you have.

3) Try Linq to XML. It will allow you to query the XML as if it were a database. It is a little slower generally but unless you need absolute performance it works very well for minimizing your work.