using InStr to search for quotes, spaces, colons,

2019-03-06 05:56发布

问题:

this is a continuation of this question scrape data from web page source where url doesn't change

i'm now trying to search the scraped data and i'm not able to code it properly, it's not finding the text below

<span id="middleContent_lbName_county" style="font-weight:bold;">

i tried doing it like this

InStr(.Document.Body.innerHTML,"<span id=" & Chr(34) & "middleContent_lbName_county" & Chr(34) & " style=" & Chr(34) & "font-weight" & Chr(58) & "bold" & Chr(59) & "")

and i'm getting 0 in return.

this works

InStr(.Document.Body.innerHTML, "<span id=" & Chr(34) & "middleContent_lbName_county")

but it's not unique enough and i'm getting way too many results.

回答1:

I am a little unclear but there is an id there you can use and the string is the outerhtml of the element

.document.getElementById("middleContent_lbName_county").outerHTML

The info within is:

.document.getElementById("middleContent_lbName_county").innerText

Using this, .innerText, will return the facility name.

With your former code:

Option Explicit
Public Sub VisitPages()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "http://healthapps.state.nj.us/facilities/acSetSearch.aspx?by=county"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("#middleContent_cbType_5").Click
            .querySelector("#middleContent_cbType_12").Click
            .querySelector("#middleContent_btnGetList").Click
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim list As Object, i  As Long
        Set list = .document.querySelectorAll("#main_table [href*=doPostBack]")
        For i = 0 To list.Length - 1
            list.item(i).Click

            While .Busy Or .readyState < 4: DoEvents: Wend

           ' Application.Wait Now + TimeSerial(0, 0, 3) '<== Delete me later. This is just to demo page changes
                Debug.Print .document.getElementById("middleContent_lbName_county").outerHTML
            'do stuff with new page
            .Navigate2 .document.URL             '<== back to homepage
            While .Busy Or .readyState < 4: DoEvents: Wend
            Set list = .document.querySelectorAll("#main_table [href*=doPostBack]") 'reset list (often required in these scenarios)
        Next
        Stop                                     '<== Delete me later
        '.Quit '<== Remember to quit application
    End With
End Sub

Some sample results: