select divs and put into collection using htmlagil

2019-08-12 04:45发布

Why does this not work? I get a null reference exception error on the foreach loop as it starts

I'm trying to get all the divs text on a page and put each one into my own collection

Imports HtmlAgilityPack
Imports System.Xml

Partial Class _Default
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load

        Dim webGet As HtmlWeb = New HtmlWeb
        Dim htmlDoc As HtmlDocument = webGet.Load("http://www.mysite.com")

        Dim ids As New List(Of String)()

        For Each div As Object In htmlDoc.DocumentNode.SelectNodes("//div")

            ids.Add(div.InnerText)

        Next



    End Sub
End Class

exception

Object reference not set to an instance of an object. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.NullReferenceException: Object reference not set to an instance of an object.

Source Error:

Line 12: Dim ids As New List(Of String)() Line 13:
Line 14: For Each div As Object In htmlDoc.DocumentNode.SelectNodes("//div") Line 15:
Line 16: ids.Add(div.InnerText)

1条回答
Juvenile、少年°
2楼-- · 2019-08-12 05:52

Your code looks correct. Perhaps the URL "http://www.mysite.com" is not returning a valid HTML.

The code below works:

Imports HtmlAgilityPack

Public Class _Default
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        Dim webGet As HtmlWeb = New HtmlWeb
        Dim htmlDoc As HtmlDocument = webGet.Load("http://stackoverflow.com/q/11528387/1350308")

        Dim ids As New List(Of String)()
        TextBox1.Text = ""
        For Each div As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//div")
            TextBox1.Text += div.Id + Environment.NewLine
        Next
    End Sub

End Class

The result in TextBox1 is:

noscript-padding
notify-container
overlay-header
custom-header

header
portalLink
topbar
hlinks
hsearch

hlogo
hmenus


content

question-header
mainbar
question
adzerk1

















comments-11528387


answers
answers-header

tabs
answer-11528559








comments-11528559

post-editor

wmd-button-bar

draft-saved
draft-discarded
wmd-preview









sidebar
newuser-box



adzerk2
hireme



























feed-link
feed-link-text

prettify-lang
footer

footer-menu
footer-sites
footer-flair
svnrev
copyright
noscript-warning

Complete source: Q11528387WebApp.7z

查看更多
登录 后发表回答