Return entire page text from IE object

2019-07-28 05:58发布

问题:

I'm using regex with VBA to pick up e-mails on webpages, all of which are formatted very differently. I'm struggling to access the entire page text owing to these differences in formats.

Currently my approach is just to use

Dim retStr as String
retStr = ie.document.body.innerText

where ie comes from Set ie = CreateObject("InternetExplorer.Application")

Seems simple enough, but on some pages such as this one not all of the page text is being returned. By "all of the page text", I mean anything that ctrl+f would act on for example. In the linked page, the text of each 'step' doesn't seem to be returned. I imagine there will be a variation between different webpages, especially if they aren't formatted in HTML.

Pressing ctrl+a on the webpage returns the text I'd like, is there some way of accessing this text without using sendkeys?

回答1:

It is working just fine for me. I have a feeling that you are writing that to an Excel cell and hence the text is getting truncated.

I wrote it to a text file and I got the complete text.

Sub Sample()
    Dim ie As Object
    Dim retStr As String

    Set ie = CreateObject("internetexplorer.application")

    With ie
        .Navigate "http://www.wikihow.com/Choose-an-Email-Address"
        .Visible = True
    End With

    Do While ie.readystate <> 4: Wait 5: Loop

    DoEvents

    retStr = ie.document.body.innerText

    '~> Write the above to a text file
    Dim filesize As Integer
    Dim FlName As String

    '~~> Change this to the relevant path
    FlName = "C:\Users\Siddharth\Desktop\Sample.Txt"

    filesize = FreeFile()

    Open FlName For Output As #filesize

    Print #filesize, retStr
    Close #filesize
End Sub

Private Sub Wait(ByVal nSec As Long)
    nSec = nSec + Timer
    While nSec > Timer
        DoEvents
    Wend
End Sub