Parsing XML into a list

2019-05-27 02:44发布

问题:

I have a quite elaborate XML I have been able to parse most of it however im coming across a tree that just has me stumped and im afraid that I'm making harder then it needs to be. here is the XML I'm referring to.

<Codes>
            <CustomFieldValueSet name="Account" label="Account" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>7200</Value>
                    <Description>General Supplies</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Activity" label="Activity" distributionType="PercentOfPrice" />
            <CustomFieldValueSet name="Chart" label="Chart" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>T</Value>
                    <Description>University</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Fund" label="Fund" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>360806</Value>
                    <Description>National Institutes of Health</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>360903</Value>
                    <Description>National  Institutes of Health</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>360957</Value>
                    <Description>National Institutes of Health</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Program" label="Program" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>02</Value>
                    <Description>Research</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Location" label="Location" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>015</Value>
                    <Description>Biology - Life Science</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
            <CustomFieldValueSet name="Organization" label="Organization" distributionType="PercentOfPrice">
                <CustomFieldValue distributionValue="10.00" splitindex="0">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="1">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
                <CustomFieldValue distributionValue="45.00" splitindex="2">
                    <Value>04400</Value>
                    <Description>TUSM:Neuroscience</Description>
                </CustomFieldValue>
            </CustomFieldValueSet>
        </Codes>

I'm trying to end up with a list the would look something like this.

Account distributionType   Activity   distributionValue  Fund
7200     PercentOfPrice     ""        10                 360806
7200     PercentOfPrice     ""        45                 360903
7200     PercentOfPrice     ""        45                 360957

etc...

I have written code the looks something like this. Here is a snippet. Mind you I think i have over complicated this.

if (tagName == "Codes")
                                {
                                  // Create another reader that contains just the accounting elements.
                                    XmlReader inner = reader.ReadSubtree();
                                    //inner.ReadToDescendant("Codes");
                                    //printOutXML(inner);
                                    while (inner.Read())
                                    {
                                        switch (inner.NodeType)
                                        {       
                                            //walk down the xml hiearchy then simply  fill in the values.
                                            case XmlNodeType.Element:

                                                switch (reader.Name)
                                                {
                                                    case "CustomFieldValueSet":
                                                       //get the attribute that we are currently working with such as account and  
                                                        innerTagName=inner.GetAttribute("name");

                                                        // activity and location can potentially be blank therefore i will check here if it is 
                                                        //and if it is i will immediate assign the activity list a set of empty quotes.
                                                        if (innerTagName == "Activity")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {   //quickly put fillers in .
                                                                for (int i = 0; i < thisInvoice.account.Count; i++)
                                                                {
                                                                    thisInvoice.activity.Add("");
                                                                }
                                                            }         
                                                        }

                                                        if (innerTagName == "Location")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {   //quickly put fillers in .
                                                                for (int i = 0; i < thisInvoice.account.Count; i++)
                                                                {
                                                                    thisInvoice.location.Add("");
                                                                }
                                                                //thisInvoice.activity.Add("");
                                                            }
                                                        }

                                                        if (null == inner.GetAttribute("distributionType"))
                                                        {
                                                            distType = null;
                                                        }
                                                       else if
                                                       (distributionSwitch == false)
                                                        {
                                                            thisInvoice.distributionType.Add(inner.GetAttribute("distributionType") ?? "");
                                                            distType = inner.GetAttribute("distributionType") ?? "";
                                                       }
                                                        //Console.WriteLine(inner.Value);
                                                        //Console.WriteLine(inner.Name);
                                                        break;

                                                    case "CustomFieldValue":
                                                        if(null == inner.GetAttribute("distributionValue"))
                                                        //thisInvoice.distributionValue.Add(inner.GetAttribute("distributionValue") ?? "");
                                                        {/*do nothing*/}
                                                    else if
                                                        (distributionSwitch == false)
                                                        {
                                                            thisInvoice.distributionValue.Add(inner.GetAttribute("distributionValue") ?? "");
                                                        }
                                                        //check the length of the current distribution  if the lenght is less than the curren distribution value
                                                       // then we must then add the values to the new location.
                                                        if (thisInvoice.distributionValue.Count > thisInvoice.distributionType.Count)
                                                        {
                                                            for (int i = 0; i < thisInvoice.distributionValue.Count - thisInvoice.distributionType.Count; i++)
                                                            {
                                                                thisInvoice.distributionType.Add(distType);
                                                            }



                                                        }

                                                        break;

                                                    case "Value":
                                                         // XmlNodeType.Text
                                                        if (innerTagName == "Account"/*&& inner.NodeType ==XmlNodeType.Text*/)
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.account.Add(inner.Value);
                                                        }


                                                        if (innerTagName == "Activity")
                                                        {
                                                            // activitiy is not a mandartory field so it could be empty therefore we need 
                                                            // to check if its  a self closing tag and if it is then we need to assign and 
                                                            if (inner.IsEmptyElement)
                                                            {
                                                                thisInvoice.activity.Add("");
                                                            }
                                                            else
                                                            {
                                                                inner.MoveToContent();// move to the text 
                                                                inner.Read();
                                                                thisInvoice.activity.Add(inner.Value);
                                                            }
                                                        }

                                                        if (innerTagName == "Location")
                                                        {
                                                            if (inner.IsEmptyElement)
                                                            {
                                                                thisInvoice.location.Add("");
                                                            }
                                                            else
                                                            {
                                                                inner.MoveToContent();// move to the text 
                                                                inner.Read();
                                                                thisInvoice.location.Add(inner.Value);
                                                            }
                                                        }

                                                        if (innerTagName == "Fund")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.fund.Add(inner.Value);
                                                        }

                                                        if (innerTagName == "Organization")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.org.Add(inner.Value);
                                                        }

                                                        if (innerTagName == "Program")
                                                        {
                                                            inner.MoveToContent();// move to the text 
                                                            inner.Read();
                                                            thisInvoice.prog.Add(inner.Value);
                                                        }

                                                       break;



                                                }//end switch
                                                break;//brake the outside case.
                                            case XmlNodeType.EndElement:
                                                if (inner.Name == "CustomFieldValueSet" || inner.Value == "CustomFieldValueSet")
                                                {
                                                    distributionSwitch = true;
                                                    Console.WriteLine(reader.Value);
                                                    Console.WriteLine(reader.Name);
                                                }
                                                if (inner.Name == "Codes")
                                                {
                                                    distributionSwitch = false;
                                                    distType = null;
                                                    inner.Close();
                                                }

                                                break;
                                        }//end switch
                                    }//end while
                                }//end the if;

In the case of the tag distributionType i need to make the list length as long as the list for account so in other words once i have it on a variable i need to use it as a filler to make the distribution type list as big as the account list. I cant imagine that there is not an easier way to do this I keep looking at linq to xml but it does not make much sense. I would love to hear how some of you experts would tackle this one. I'm trying to put together an elegant solution with a little less code. Any help would be greatly appreciated.

回答1:

As specified in the comments section, an alternative to Mihai's solution of using LINQ to XML, you can also use a pre-defined class structure to deserialize your XML into typed classes and properties.

The benefit of this is that you will then have an object that is a representation of your XML (well hopefully) and allow you to more easily work with the data that was inside the XML

With the supplied XML sample and using the Edit -> Paste Special -> Paste XML as Classes menu option in Visual Studio, you will get a class structure similar to the one below (this one has been refined a bit for easier reading)

using System.Xml.Serialization;

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = false)]
public partial class Codes
{
  [XmlElementAttribute("CustomFieldValueSet")]
  public List<CodesCustomFieldValueSet> CustomFieldValueSet { get; set; }
}

[XmlTypeAttribute(AnonymousType = true)]
public partial class CodesCustomFieldValueSet
{
  [XmlElementAttribute("CustomFieldValue")]
  public List<CodesCustomFieldValueSetCustomFieldValue> CustomFieldValue { get; set; }

  [XmlAttributeAttribute(AttributeName="name")]
  public string Name { get; set; }

  [XmlAttributeAttribute(AttributeName = "label")]
  public string Label { get; set; }

  [XmlAttributeAttribute(AttributeName = "distributionType")]
  public string DistributionType { get; set; }
}

[XmlTypeAttribute(AnonymousType = true)]
public partial class CodesCustomFieldValueSetCustomFieldValue
{
  public string Value { get; set; }

  public string Description { get; set; }

  [XmlAttributeAttribute(AttributeName = "distributionValue")]
  public decimal DistributionValue { get; set; }

  [XmlAttributeAttribute(AttributeName = "splitindex")]
  public byte SplitIndex { get; set; }
}

With this class structure, you are then able to deserialize your XML with the below lines
(where txtInput.Text is a TextBox I used to hold the sample XML data)

XmlSerializer serializer = new XmlSerializer(typeof(Codes));
Codes codesInput = serializer.Deserialize(new StringReader(txtInput.Text)) as Codes;

if (codesInput != null)
{
  // Do something with the data
}

NOTE:
From your desired output and the structure of the sample XML you supplied, there will be a requirement for you to transform the information in the deserialized object into what/how you want it, for that I would recommend creating an additional class structure, combined with a List<T>, to hold all the information as shown in your desired output.

Even better would be if you controlled the XML's structure and could structure it in a better way as to make it more self explanatory than what it currently is, as it seems that the links between each CustomFieldValueSet is the splitindex, which is an attribute of the child nodes, which complicates it a lot.

Further reading on XML Serialization:
MSDN: Introducing XML Serialization
The XmlSerializer Class



回答2:

You can use Linq to XML for this.

using System.Xml;
using System.Xml.Linq;

static void Main(string[] args) {

// This txt file contains your xml.
var xml_sample = File.ReadAllText("xml_sample.txt");
var doc = XDocument.Parse(xml_sample);

// Get all <CustomFieldValueSet> that have the label attribute `Account`
var accounts = from item in doc.Descendants("Codes").Descendants("CustomFieldValueSet")
               where (item.HasAttributes) && 
                     (item.Attribute("label").Value == "Account")
               select item;

// Create an anonymous type containing the value of the 
// distributionValue attribute and the <Value> node.
var accountValue = from el in accounts.Descendants("CustomFieldValue")
                   let distAttribute = el.Attribute("distributionValue")
                   select new
                   {
                       distValue = distAttribute != null ? distAttribute.Value : "0",
                       value = el.Descendants("Value").First().Value,
                   };

// Display stuff here just to make sure we got it right.
accounts.ToList().ForEach(el => 
    Console.WriteLine(el.Name + " " + el.Attribute("distributionType").Value));

accountValue.ToList().ForEach(el => 
    Console.WriteLine(el.distValue + ":"+ el.value));
}

You should be able to use these ideas to parse your XML file as needed.