Recursively reading an xml document and using rege

I have an xml document like the following:

<menuitem navigateurl="/PressCentre/" text="&#1087;&#1088;&#1077;&#1089; &#1094;&#1077;&#1085;&#1090;&#1098;&#1088;">
    <menuitem navigateurl="/PressCentre/RegisterForPressAlerts/" text="&#1088;&#1077;&#1075;&#1080;&#1089;&#1090;&#1098;&#1088; &#1079;&#1072; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    <menuitem navigateurl="/PressCentre/PressReleases/" text="&#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;">
        <menuitem navigateurl="/PressCentre/PressReleases/PressReleasesArchive/" text="&#1072;&#1088;&#1093;&#1080;&#1074; &#1087;&#1088;&#1077;&#1089; &#1089;&#1098;&#1086;&#1073;&#1097;&#1077;&#1085;&#1080;&#1103;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressKit/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1084;&#1087;&#1083;&#1077;&#1082;&#1090;">
        <menuitem navigateurl="/PressCentre/PressKit/FactSheets/" text="&#1089;&#1087;&#1080;&#1089;&#1098;&#1082; &#1092;&#1072;&#1082;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/ExpertComments/" text="&#1082;&#1086;&#1084;&#1077;&#1085;&#1090;&#1072;&#1088;&#1080; &#1085;&#1072; &#1077;&#1082;&#1089;&#1087;&#1077;&#1088;&#1090;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/Testimonials/" text="&#1087;&#1088;&#1077;&#1087;&#1086;&#1088;&#1098;&#1082;&#1080;" />
        <menuitem navigateurl="/PressCentre/PressKit/MediaFiles/" text="&#1084;&#1077;&#1076;&#1080;&#1103; &#1092;&#1072;&#1081;&#1083;&#1086;&#1074;&#1077;" />
        <menuitem navigateurl="/PressCentre/PressKit/Photography/" text="&#1089;&#1085;&#1080;&#1084;&#1082;&#1080;" />
    </menuitem>
    <menuitem navigateurl="/PressCentre/PressContacts/" text="&#1087;&#1088;&#1077;&#1089; &#1082;&#1086;&#1085;&#1090;&#1072;&#1082;&#1090;&#1080;" />
</menuitem>

I need to get the value between navigateurl (e.g. "/PressCentre"). Is there a well known regex script to do this?

Thanks

标签： xml regex xslt

6条回答

来，给爷笑一个

2楼-- · 2020-03-07 07:08

My post addresses a specific need related to the OP's inquiry, but not specifically what the OP asked. I love both Regex and recursion when I need them, but in this case I think the goal of the OP's inquiry was to learn a way to generate properly-formatted XML output, and what I've provided below does exactly that with no heavy contextual source development (why reinvent the wheel?) and is supported in back in the .NET 2.0 framework.

In my work, I often end up supporting modern government systems. Those systems often still only support up through 2.0 on deployment systems -- primarily for reasons of security. The 2.0 Framework lacks some of the graceful output of more recent .NET editions, particularly where XML objects are concerned. The fully validated method-set below has been valuable and time-saving to me and I offer it for unseen developer comrades who also service government interests.

Additionally you can also utilize LinqBridge libraries for limited Linq support (.NET up through 3.5 service-pack actually internally self-evaluates to 2.0 so LinqBridge was constructed to bridge that specific gap (limited Linq query support while developing to 2.0 build while using Visual Studio 2008). However, note that LinqBridge is currently not supported forward of Visual Studio 2008.

In order to minimize package publish-sizes and also stay compatible with the organizational requirements where I provide my services I avoid using associative non-XML libraries (such as Regex) for parsing XML and stick to standard XML objects. Specifically the older Xml*-prefix objects vs the more modern (and much more flexible) X*-prefix objects...

Below I provide numerous safe, simple, efficient methods that generate formatted XML from an assortment of standard 2.0 Xml* objects. Also note that the workhorse for these functions is really the XPathNavigator class, not it's cousins.

Here is a C# code fragment that calls the sample methods:

doc = new XmlDocument();
doc.Load(Input_FilePath);
sb = StringBuilderFromXmlDocument(doc);
Out(sb);
sb = StringBuilderFromXPathDocument(new XPathDocument(Input_FilePath));
Out(sb);
sb = StringBuilderFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(sb);
ss = StringFromXmlDocument(doc);
Out(ss);
ss = StringFromXPathDocument(new XPathDocument(Input_FilePath));
Out(ss);
ss = StringFromXPathNavigator(new XPathDocument(Input_FilePath).CreateNavigator());
Out(ss);

and here are the sample methods, one of which will likely suffice your XML formatting needs:

public static StringBuilder StringBuilderFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringBuilderFromXPathNavigator(_xpn);
}

private static StringBuilder StringBuilderFromXPathDocument(XPathDocument _xpd)
{
    StringBuilder returnVal = new StringBuilder();
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

private static StringBuilder StringBuilderFromXPathNavigator(XPathNavigator _xpn)
{
    StringBuilder returnVal = new StringBuilder();
    try
    {
        returnVal.AppendLine(_xpn.OuterXml.Trim());
    }
    catch
    {
        returnVal = new StringBuilder()
            .Append(DEFAULT_ERROR_TEXT);
    }
    return returnVal;
}

public static string StringFromXmlDocument(XmlDocument _xd)
{
    XPathNavigator _xpn;
    try
    {
        _xpn = _xd.CreateNavigator();
    }
    catch
    {
        _xd.LoadXml(DEFAULT_ERROR_TEXT);
        _xpn = _xd.CreateNavigator();
    }
    return StringFromXPathNavigator(_xpn);
}

private static string StringFromXPathNavigator(XPathNavigator _xpn)
{
    string returnVal;
    try
    {
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    returnVal = _xpn.OuterXml.Trim();
    return returnVal;
}

private static string StringFromXPathDocument(XPathDocument _xpd)
{
    string returnVal;
    XPathNavigator _xpn;
    try
    {
        _xpn = _xpd.CreateNavigator();
        returnVal = _xpn.OuterXml.Trim();
    }
    catch
    {
        returnVal = DEFAULT_ERROR_TEXT;
    }
    return returnVal;
}

enjoy. ^^

Note that in later Framework editions and using newer XElement objects you can foreach(){} the XElement's nodes and .ToString() each result for automated proper formatting. Like I said above, much more graceful :).

0人赞添加讨论(0) 举报

再贱就再见

3楼-- · 2020-03-07 07:08

How to Recursively read an XML document using regex in Java

public static void main(String[] args) {
        String data**="<CheckExistingDSLService>" +
                "<DSLPN>4137361787</DSLPN>" +
                "<DSLPN>8566944014</DSLPN>" +
                "<ClientRequestId>CRID</ClientRequestId>" +
                "<DSLPN>8566944024</DSLPN>" +
                "<ClientSystemId>SSPORD</ClientSystemId>" +
                "<Authentication>" +
                "<Id>SSPORD</Id>" +
                "</Authentication>" +
                "<Comment>Service to check CheckExistingDSL</Comment>"** +
                "</CheckExistingDSLService>";
        System.out.print("The dats is "+listDataElements(data));

    }
    private static final Pattern PATTERN_1 = Pattern.compile("<([^<>]+)>([^<>]+)</\\1>"); 
    private static List<String> listDataElements(CharSequence cs) {     
        List<String> list = new ArrayList<String>();     
        Matcher matcher = PATTERN_1.matcher(cs);    
        while (matcher.find()) {         
            if(matcher.group(1).equalsIgnoreCase("DSLPN")){
                try{
                    Long number=Long.parseLong(matcher.group(2));
                    list.add(number.toString());

                }catch(Exception e){
                    System.out.println("Do noting this is notnumber ");                 
                }
            }
        } return list; 
    }

The Output you will get: The date is [4137361787, 8566944014, 8566944024]

0人赞添加讨论(0) 举报

祖国的老花朵

4楼-- · 2020-03-07 07:25

A basic recursion (not tested but I think it's ok):

private void Caller(String filepath)
{
    XPathDocument oDoc = new XPathDocument(filepath);
    Readnodes( oDoc.CreateNavigator() );
}

private void ReadNodes(XPathNavigator nav)
{
    XPathNodeIterator nodes = nav.Select("menuitem");
    while (nodes.MoveNext())
    {
        //A - read the attribute
        string url = nodes.Current.GetAttribute("navigateurl", string.Empty);

        //B - do something with the data

        //C - recurse
        ReadNodes(nodes.Current);
    }
}

...works because an XPathNodeIterator's Current property is also an XPathNavigator. Obviously you'd need to extend this to push data to a dictionary or keep track of depth or whatever.

0人赞添加讨论(0) 举报

Explosion°爆炸

5楼-- · 2020-03-07 07:29

Use xpath, //menuitem[@navigateurl]/@navigateurl .

This xpath will grab all the menu items which have an attribute naviagate url and return a node-list (xpath 1.0) or sequence (xpath 2.0) of navigateurl values. By having the navigateurl attribute predicate, that ensures that only the leaf menu items are fetched.

0人赞添加讨论(0) 举报

我想做一个坏孩纸

6楼-- · 2020-03-07 07:32

Why use Regex for this when XPath is (to me, at least) the natural choice? That's basically what XSLT should implement...

0人赞添加讨论(0) 举报

我只想做你的唯一

7楼-- · 2020-03-07 07:33

Any particular reason you're using a regex? Have you tried using XPath for this? Here are some examples of how to use XPath. http://www.w3schools.com/XPath/xpath_examples.asp

0人赞添加讨论(0) 举报

Recursively reading an xml document and using rege

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间