Get text between two strings Regex VB.Net

2020-05-08 08:51发布

问题:

I really have serious problems with regex. I need to get all text between 2 strings, in this case that strings are <span class="user user-role-registered-member"> and </span>.

I googled pretty much questions (some of them are on StackOverFlow), and watched YouTube tutorials, still can't get it.

This is the code that i think would work, but i don't know why it doesn't.

Dim mystring As String = "<br>Terms of Service<br></br>Developers<br>"

Dim pattern1 As String = "(?<=<br>)(.*?)(?=<br>)"
Dim pattern2 As String = "(?<=</br>)(.*)(?=<br>)"

Dim m1 As MatchCollection = Regex.Matches(mystring, pattern1)
Dim m2 As MatchCollection = Regex.Matches(mystring, pattern2)
MsgBox(m1(0).ToString)
MsgBox(m2(0).ToString)

Ok, so this code works pretty well....with <br>. I tried to change pattern1 and pattern2's <br> with span but it doesn't work. I know that i am making a mistake here, but i don't know where/how.

Any answer will be really helpful.

回答1:

This does the job easily and beautifully. It won't return a match when there is no text inside the span, so you do not need to worry about empty matches. It will however return groups with only whitespace in them.

<span class=""user user-role-registered-member"">(.+)</span>

Test it out here.



回答2:

You can also do it with XML:

Dim s As String = "<span class=""user user-role-registered-member"">Keyboard</span>"
Dim doc As New System.Xml.XmlDocument
doc.LoadXml(s)
Console.WriteLine(doc.FirstChild.InnerText) ' Outputs: "Keyboard"

There are reasons given for not trying to parse HTML with regexes at RegEx match open tags except XHTML self-contained tags.



回答3:

Thank you very much for answers. I found answer by myself (thanks to Evil Tak i got an idea).

Dim findtext As String = "(?<=<span class=""user user-role-registered-member"">)(.*?)(?=</span>)"
Dim myregex As String = "<span class=""user user-role-registered-member"">Keyboard</span>"
Dim doregex As MatchCollection = Regex.Matches(myregex, findtext)
MsgBox(doregex(0).ToString)

StackOverFlow is so powerful...♥



回答4:

Use Explicit capture groups. The following should do the job:

Dim exp = "<span class=""user user-role-registered-member"">(?<GRP>.*)</span>"
Dim M = System.Text.RegularExpressions.Regex.Match(YourInputString, exp, System.Text.RegularExpressions.RegexOptions.ExplicitCapture)
If M.Groups("GRP").Value <> "" Then
  Return M.Groups("GRP").Value
End If


回答5:

Your text is xml, so why to hack a strings with Regex if you can do it in readable and clear way.
With LINQ to XML

Dim htmlPage As XDocument = XDocument.Parse(downloadedHtmlPage)

Dim className As String = "user user-role-registered-member"
Dim value As String = 
    htmlPage.Descendants("span").
    Where(Function(span) span.Attribute("class").Value.Equals(className)).
    FirstOrDefault().
    Value

And with Accessing XML in Visual Basic

Dim htmlPage As XDocument = XDocument.Parse(downloadedHtmlPage)

Dim className As String = "user user-role-registered-member"
Dim value As String = 
    htmlPage...<span>.
    Where(Function(span) span.@class.Value.Equals(className)).
    FirstOrDefault().
    Value