Recursively reading an xml document and using rege

2020-03-07 06:55发布

I have an xml document like the following:

<menuitem navigateurl="/PressCentre/" text="&#1087;&#1088;&#1077;&#1089; &#1094;&#1077;&#1085;&#1090;&#1098;&#1088;">
    <menuitem navigateurl="/PressCentre/RegisterForPressAlerts/" text="&#1088;&#1077;&#1075;&#1080;&#1089;&#1090;&#1098;&#1088; &#1079;&#1072; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    <menuitem navigateurl="/PressCentre/PressReleases/" text="&#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;">
        <menuitem navigateurl="/PressCentre/PressReleases/PressReleasesArchive/" text="&#1072;&#1088;&#1093;&#1080;&#1074; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressKit/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1084;&#1087;&#1083;&#1077;&#1082;&#1090;">
        <menuitem navigateurl="/PressCentre/PressKit/FactSheets/" text="&#1089;&#1087;&#1080;&#1089;&#1098;&#1082; &#1092;&#1072;&#1082;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/ExpertComments/" text="&#1082;&#1086;&#1084;&#1077;&#1085;&#1090;&#1072;&#1088;&#1080; &#1085;&#1072; &#1077;&#1082;&#1089;&#1087;&#1077;&#1088;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/Testimonials/" text="&#1087;&#1088;&#1077;&#1087;&#1086;&#1088;&#1098;&#1082;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/MediaFiles/" text="&#1084;&#1077;&#1076;&#1080;&#1103; &#1092;&#1072;&#1081;&#1083;&#1086;&#1074;&#1077;" />
        <menuitem navigateurl="/PressCentre/PressKit/Photography/" text="&#1089;&#1085;&#1080;&#1084;&#1082;&#1080;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressContacts/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1080;" />
</menuitem>

I need to get the value between navigateurl (e.g. "/PressCentre"). Is there a well known regex script to do this?

Thanks

标签: xml regex xslt
6条回答
来,给爷笑一个
2楼-- · 2020-03-07 07:08

My post addresses a specific need related to the OP's inquiry, but not specifically what the OP asked. I love both Regex and recursion when I need them, but in this case I think the goal of the OP's inquiry was to learn a way to generate properly-formatted XML output, and what I've provided below does exactly that with no heavy contextual source development (why reinvent the wheel?) and is supported in back in the .NET 2.0 framework.

In my work, I often end up supporting modern government systems. Those systems often still only support up through 2.0 on deployment systems -- primarily for reasons of security. The 2.0 Framework lacks some of the graceful output of more recent .NET editions, particularly where XML objects are concerned. The fully validated method-set below has been valuable and time-saving to me and I offer it for unseen developer comrades who also service government interests.

Additionally you can also utilize LinqBridge libraries for limited Linq support (.NET up through 3.5 service-pack actually internally self-evaluates to 2.0 so LinqBridge was constructed to bridge that specific gap (limited Linq query support while developing to 2.0 build while using Visual Studio 2008). However, note that LinqBridge is currently not supported forward of Visual Studio 2008.

In order to minimize package publish-sizes and also stay compatible with the organizational requirements where I provide my services I avoid using associative non-XML libraries (such as Regex) for parsing XML and stick to standard XML objects. Specifically the older Xml*-prefix objects vs the more modern (and much more flexible) X*-prefix objects...

Below I provide numerous safe, simple, efficient methods that generate formatted XML from an assortment of standard 2.0 Xml* objects. Also note that the workhorse for these functions is really the XPathNavigator class, not it's cousins.

Here is a C# code fragment that calls the sample methods:

doc = new XmlDocument();
doc.Load(Input_FilePath);
sb = StringBuilderFromXmlDocument(doc);
Out(sb);
sb = StringBuilderFromXPathDocument(new XPathDocument(Input_FilePath));
Out(sb);
sb = StringBuilderFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(sb);
ss = StringFromXmlDocument(doc);
Out(ss);
ss = StringFromXPathDocument(new XPathDocument(Input_FilePath));
Out(ss);
ss = StringFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(ss);

and here are the sample methods, one of which will likely suffice your XML formatting needs:

public static StringBuilder StringBuilderFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringBuilderFromXPathNavigator(_xpn);
}

private static StringBuilder StringBuilderFromXPathDocument(XPathDocument _xpd)
{
    StringBuilder returnVal = new StringBuilder();
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

private static StringBuilder StringBuilderFromXPathNavigator(XPathNavigator _xpn)
{
    StringBuilder returnVal = new StringBuilder();
    try
    {
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

public static string StringFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringFromXPathNavigator(_xpn);
}

private static string StringFromXPathNavigator(XPathNavigator _xpn)
{
    string returnVal;
    try
    {
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    returnVal = _xpn.OuterXml.Trim();
    return returnVal;
}

private static string StringFromXPathDocument(XPathDocument _xpd)
{
    string returnVal;
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    return returnVal;
}

enjoy. ^^

Note that in later Framework editions and using newer XElement objects you can foreach(){} the XElement's nodes and .ToString() each result for automated proper formatting. Like I said above, much more graceful :).

查看更多
再贱就再见
3楼-- · 2020-03-07 07:08

How to Recursively read an XML document using regex in Java

public static void main(String[] args) {
        String data**="<CheckExistingDSLService>" +
                "<DSLPN>4137361787</DSLPN>" +
                "<DSLPN>8566944014</DSLPN>" +
                "<ClientRequestId>CRID</ClientRequestId>" +
                "<DSLPN>8566944024</DSLPN>" +
                "<ClientSystemId>SSPORD</ClientSystemId>" +
                "<Authentication>" +
                "<Id>SSPORD</Id>" +
                "</Authentication>" +
                "<Comment>Service to check CheckExistingDSL</Comment>"** +
                "</CheckExistingDSLService>";
        System.out.print("The dats is "+listDataElements(data));

    }
    private static final Pattern PATTERN_1 = Pattern.compile("<([^<>]+)>([^<>]+)</\\1>"); 
    private static List<String> listDataElements(CharSequence cs) {     
        List<String> list = new ArrayList<String>();     
        Matcher matcher = PATTERN_1.matcher(cs);    
        while (matcher.find()) {         
            if(matcher.group(1).equalsIgnoreCase("DSLPN")){
                try{
                    Long number=Long.parseLong(matcher.group(2));
                    list.add(number.toString());

                }catch(Exception e){
                    System.out.println("Do noting this is notnumber ");                 
                }
            }
        } return list; 
    }

The Output you will get: The date is [4137361787, 8566944014, 8566944024]

查看更多
祖国的老花朵
4楼-- · 2020-03-07 07:25

A basic recursion (not tested but I think it's ok):

private void Caller(String filepath)
{
    XPathDocument oDoc = new XPathDocument(filepath);
    Readnodes( oDoc.CreateNavigator() );
}

private void ReadNodes(XPathNavigator nav)
{
    XPathNodeIterator nodes = nav.Select("menuitem");
    while (nodes.MoveNext())
    {
        //A - read the attribute
        string url = nodes.Current.GetAttribute("navigateurl", string.Empty);

        //B - do something with the data

        //C - recurse
        ReadNodes(nodes.Current);
    }
}

...works because an XPathNodeIterator's Current property is also an XPathNavigator. Obviously you'd need to extend this to push data to a dictionary or keep track of depth or whatever.

查看更多
Explosion°爆炸
5楼-- · 2020-03-07 07:29

Use xpath, //menuitem[@navigateurl]/@navigateurl .

This xpath will grab all the menu items which have an attribute naviagate url and return a node-list (xpath 1.0) or sequence (xpath 2.0) of navigateurl values. By having the navigateurl attribute predicate, that ensures that only the leaf menu items are fetched.

查看更多
我想做一个坏孩纸
6楼-- · 2020-03-07 07:32

Why use Regex for this when XPath is (to me, at least) the natural choice? That's basically what XSLT should implement...

查看更多
我只想做你的唯一
7楼-- · 2020-03-07 07:33

Any particular reason you're using a regex? Have you tried using XPath for this? Here are some examples of how to use XPath. http://www.w3schools.com/XPath/xpath_examples.asp

查看更多
登录 后发表回答