How to extract something between <!— --> usi

2019-08-12 05:17发布

问题:

I'm trying to scrape a page using VBA. I know how to get elements by id class and tag names. But now I have come across this Tag

<!-- <b>IE CODE : 3407004044</b> -->

Now after searching on the internet I know that this is a comment in the HTML, but what I'm unable to find is what is the tag name of this element ,if it qualifies as a tag at all. Should I use

documnet.getelementsbytagname("!") ?

If not, how else can I extract these comments ?

EDIT: I have a bunch of these td elements within tr elements and I want to extract IE Code : 3407004044 Below is a larger set of HTML code:

<tr align="left">
    <td width="50%" class="subhead1">                                                           

    ' this is the part that I want to extract
    <!-- <b>IE CODE : 3108011111</b> -->                                
    </td>
    <td rowspan="9" valign="top">
    <span id="datalist1_ctl00_lbl_p"></span>
    </td>
</tr>

Thanks!

回答1:

Give it a try like this, it works if you fix it a bit further:

Option Explicit

Public Sub TestMe()

    Dim myString    As String
    Dim cnt         As Long
    Dim myArr       As Variant

    myString = "<!-- <b>IE CODE : Koj sega e</b> -->blas<hr>My Website " & _
                    "is here<B><B><B><!-- <b>IE CODE : nomer </b> -->" & _
                    "is here<B><B><B><!-- <b>IE CODE : 1? </b> -->"

    myString = Replace(myString, "-->", "<!--")
    myArr = Split(myString, "<!--")

    For cnt = LBound(myArr) To UBound(myArr)
        If cnt Mod 2 = 1 Then Debug.Print myArr(cnt)
    Next cnt

End Sub

This is what you get:

 <b>IE CODE : Koj sega e</b> 
 <b>IE CODE : nomer </b> 
 <b>IE CODE : 1? </b> 

The idea is the following:

  • Replace the --> with <!--
  • Split the input by <!--
  • Take every second value from the array

There are some possible scenarios, where it will not work, e.g. if you have --> or <!-- written somewhere within the text, but in the general case it should be ok.



回答2:

You can use XPath:

substring-before(substring-after(//tr//comment(), "<b>"), "</b>")

to get required data