可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have an xml document like the following:

<menuitem navigateurl="/PressCentre/" text="&#1087;&#1088;&#1077;&#1089; &#1094;&#1077;&#1085;&#1090;&#1098;&#1088;">
    <menuitem navigateurl="/PressCentre/RegisterForPressAlerts/" text="&#1088;&#1077;&#1075;&#1080;&#1089;&#1090;&#1098;&#1088; &#1079;&#1072; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    <menuitem navigateurl="/PressCentre/PressReleases/" text="&#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;">
        <menuitem navigateurl="/PressCentre/PressReleases/PressReleasesArchive/" text="&#1072;&#1088;&#1093;&#1080;&#1074; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressKit/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1084;&#1087;&#1083;&#1077;&#1082;&#1090;">
        <menuitem navigateurl="/PressCentre/PressKit/FactSheets/" text="&#1089;&#1087;&#1080;&#1089;&#1098;&#1082; &#1092;&#1072;&#1082;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/ExpertComments/" text="&#1082;&#1086;&#1084;&#1077;&#1085;&#1090;&#1072;&#1088;&#1080; &#1085;&#1072; &#1077;&#1082;&#1089;&#1087;&#1077;&#1088;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/Testimonials/" text="&#1087;&#1088;&#1077;&#1087;&#1086;&#1088;&#1098;&#1082;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/MediaFiles/" text="&#1084;&#1077;&#1076;&#1080;&#1103; &#1092;&#1072;&#1081;&#1083;&#1086;&#1074;&#1077;" />
        <menuitem navigateurl="/PressCentre/PressKit/Photography/" text="&#1089;&#1085;&#1080;&#1084;&#1082;&#1080;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressContacts/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1080;" />
</menuitem>

I need to get the value between navigateurl (e.g. "/PressCentre"). Is there a well known regex script to do this?

Thanks

回答1:

A basic recursion (not tested but I think it's ok):

private void Caller(String filepath)
{
    XPathDocument oDoc = new XPathDocument(filepath);
    Readnodes( oDoc.CreateNavigator() );
}

private void ReadNodes(XPathNavigator nav)
{
    XPathNodeIterator nodes = nav.Select("menuitem");
    while (nodes.MoveNext())
    {
        //A - read the attribute
        string url = nodes.Current.GetAttribute("navigateurl", string.Empty);

        //B - do something with the data

        //C - recurse
        ReadNodes(nodes.Current);
    }
}

...works because an XPathNodeIterator's Current property is also an XPathNavigator. Obviously you'd need to extend this to push data to a dictionary or keep track of depth or whatever.

回答2:

Why use Regex for this when XPath is (to me, at least) the natural choice? That's basically what XSLT should implement...

回答3:

Any particular reason you're using a regex? Have you tried using XPath for this? Here are some examples of how to use XPath. http://www.w3schools.com/XPath/xpath_examples.asp

回答4:

Use xpath, //menuitem[@navigateurl]/@navigateurl .

This xpath will grab all the menu items which have an attribute naviagate url and return a node-list (xpath 1.0) or sequence (xpath 2.0) of navigateurl values. By having the navigateurl attribute predicate, that ensures that only the leaf menu items are fetched.

回答5:

My post addresses a specific need related to the OP's inquiry, but not specifically what the OP asked. I love both Regex and recursion when I need them, but in this case I think the goal of the OP's inquiry was to learn a way to generate properly-formatted XML output, and what I've provided below does exactly that with no heavy contextual source development (why reinvent the wheel?) and is supported in back in the .NET 2.0 framework.

In my work, I often end up supporting modern government systems. Those systems often still only support up through 2.0 on deployment systems -- primarily for reasons of security. The 2.0 Framework lacks some of the graceful output of more recent .NET editions, particularly where XML objects are concerned. The fully validated method-set below has been valuable and time-saving to me and I offer it for unseen developer comrades who also service government interests.

Additionally you can also utilize LinqBridge libraries for limited Linq support (.NET up through 3.5 service-pack actually internally self-evaluates to 2.0 so LinqBridge was constructed to bridge that specific gap (limited Linq query support while developing to 2.0 build while using Visual Studio 2008). However, note that LinqBridge is currently not supported forward of Visual Studio 2008.

In order to minimize package publish-sizes and also stay compatible with the organizational requirements where I provide my services I avoid using associative non-XML libraries (such as Regex) for parsing XML and stick to standard XML objects. Specifically the older Xml*-prefix objects vs the more modern (and much more flexible) X*-prefix objects...

Below I provide numerous safe, simple, efficient methods that generate formatted XML from an assortment of standard 2.0 Xml* objects. Also note that the workhorse for these functions is really the XPathNavigator class, not it's cousins.

Here is a C# code fragment that calls the sample methods:

doc = new XmlDocument();
doc.Load(Input_FilePath);
sb = StringBuilderFromXmlDocument(doc);
Out(sb);
sb = StringBuilderFromXPathDocument(new XPathDocument(Input_FilePath));
Out(sb);
sb = StringBuilderFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(sb);
ss = StringFromXmlDocument(doc);
Out(ss);
ss = StringFromXPathDocument(new XPathDocument(Input_FilePath));
Out(ss);
ss = StringFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(ss);

and here are the sample methods, one of which will likely suffice your XML formatting needs:

public static StringBuilder StringBuilderFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringBuilderFromXPathNavigator(_xpn);
}

private static StringBuilder StringBuilderFromXPathDocument(XPathDocument _xpd)
{
    StringBuilder returnVal = new StringBuilder();
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

private static StringBuilder StringBuilderFromXPathNavigator(XPathNavigator _xpn)
{
    StringBuilder returnVal = new StringBuilder();
    try
    {
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

public static string StringFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringFromXPathNavigator(_xpn);
}

private static string StringFromXPathNavigator(XPathNavigator _xpn)
{
    string returnVal;
    try
    {
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    returnVal = _xpn.OuterXml.Trim();
    return returnVal;
}

private static string StringFromXPathDocument(XPathDocument _xpd)
{
    string returnVal;
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    return returnVal;
}

enjoy. ^^

Note that in later Framework editions and using newer XElement objects you can foreach(){} the XElement's nodes and .ToString() each result for automated proper formatting. Like I said above, much more graceful :).

回答6:

How to Recursively read an XML document using regex in Java

public static void main(String[] args) {
        String data**="<CheckExistingDSLService>" +
                "<DSLPN>4137361787</DSLPN>" +
                "<DSLPN>8566944014</DSLPN>" +
                "<ClientRequestId>CRID</ClientRequestId>" +
                "<DSLPN>8566944024</DSLPN>" +
                "<ClientSystemId>SSPORD</ClientSystemId>" +
                "<Authentication>" +
                "<Id>SSPORD</Id>" +
                "</Authentication>" +
                "<Comment>Service to check CheckExistingDSL</Comment>"** +
                "</CheckExistingDSLService>";
        System.out.print("The dats is "+listDataElements(data));

    }
    private static final Pattern PATTERN_1 = Pattern.compile("<([^<>]+)>([^<>]+)</\\1>"); 
    private static List<String> listDataElements(CharSequence cs) {     
        List<String> list = new ArrayList<String>();     
        Matcher matcher = PATTERN_1.matcher(cs);    
        while (matcher.find()) {         
            if(matcher.group(1).equalsIgnoreCase("DSLPN")){
                try{
                    Long number=Long.parseLong(matcher.group(2));
                    list.add(number.toString());

                }catch(Exception e){
                    System.out.println("Do noting this is notnumber ");                 
                }
            }
        } return list; 
    }

The Output you will get: The date is [4137361787, 8566944014, 8566944024]