How can I extract a value from an html page in vbs

2019-03-01 10:41发布

问题:

Below is some code I tried to get the value from a node in webpage. But it fails when trying to set the objNode... any help gratefully appreciated.

Dim objHttp, sWebPage, objNode, objDoc

Set objDoc = CreateObject("MSXML2.DOMDocument")
objDoc.Load "http://www.hl.co.uk/shares/shares-search-results/a/aveva-group-plc-ordinary-3.555p"

' objDoc.setProperty "SelectionLanguage", "XPath"

' Find a particular element using XPath:
Set objNode = objDoc.selectSingleNode("span[@id='ls-bid-AVV-L']")
MsgBox objNode.getAttribute("value")

回答1:

  1. It's very optimistic to expect an XML parser to handle clean HTML; for flawed HTML, you can forget it (ref).
  2. You should never .load without checking for errors (see also). In your case, the .reason thrown is "The attribute 'property' on this element is not defined in the DTD/Schema."
  3. You can switch off the validation with objDoc.validateOnParse = False and avoid problems with monster pages with objDoc.async = False (at least no "msxml3.dll: The data necessary to complete this operation is not yet available." error).
  4. To search for a span anywhere (without knowing its place in the hierarchy) you need "//span[@id='ls-bid-AVV-L']" instead of "span[@id='ls-bid-AVV-L']".
  5. The span to find has no attribute named value; to get the "1,334.00p" you'd need to ask for objNode.text.
  6. But all this is to no avail: The page is not even well-formed. The .parseError.reason is "End tag 'div' does not match the start tag 'input'.".


回答2:

Use the Internet Explorer COM object:

url = "http://www.hl.co.uk/shares/shares-search-results/a/aveva-group-plc-ordinary-3.555p"

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate url
While ie.ReadyState <> 4
  WScript.Sleep 100
Wend

MsgBox ie.document.getElementById("ls-bid-AVV-L").innerText

ie.Quit