Get the value between <> with dynamic number in

2020-03-30 05:32发布

问题:

I am working on a text summarization method ,for test my method i have a benchmark called doc 2007 ,inside this benchmark i have a lot of xml file ,i should clear that file .

for example i have a xml file like this:

<sentence id='s0'>
 The nature of the proceeding 

1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises.</sentence>

<sentence id='s1'>In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions.</sentence>
;

I should extract the string between <sentence id='s0'></sentence> and <sentence id='s1'></sentence> I mean the result should be like this :

The nature of the proceeding 

     1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises.

In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions.

I found some thing like this :

Regex.Match("User name (sales)", @"\(([^)]*)\)").Groups[1].Value

using Regex,but it doesn't work .could you please give me a fast solution to do that?

回答1:

Using LINQ to XML should be easier:

var res = XElement.Parse(xml)
                  .Descendants("sentence").Where(e => e.Attribute("id").Value == "s0")
                  .FirstOrDefault().Value;

or, as Yeldar suggested, the cleaner way would be:

var s0 = XElement.Parse(xml)
                 .Descendants("sentence").FirstOrDefault(e => e.Attribute("id").Value == "s0")
                 .Value;


回答2:

XElment.Parse only use in String with single root node. The instance you wrote have two nodes '' without one root node. You can add a root node like below:

xml = "<root>" + xml + "</root>";