How to get the source code of a html page using VB

2019-01-17 18:40发布

I'm writing a program that gets the source code of a web page with a video on it. It then uses regular expressions to isolate the download link of that video. then it uses httpwebrequest and httpwebresponse to download the video. My problem arises when certain sites have a page where you have to click continue in order to get to the video page.

For example, there is a video playing on http://nextgenvidz.com/view/s995xvc9e2fv called "The.Matrix.Reloaded.2003.mp4" so I tell my program to get the source code for the url "http://nextgenvidz.com/view/s995xvc9e2fv" but it can't find the video's download link because it's searching for the file in the "continue" page's source code. If you go to that website above and view source, you won't see the link. Then, click continue and do the same when the video appears and you'll notice that the file is only there in the second one.

How can I get the source code for the page that the video is playing on, and not the page where I have to click continue?

I am trying to use this code:

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim Loading As String = "Loading..."
    TextBox1.Text = Loading
    Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox2.Text)
    Dim response As System.Net.HttpWebResponse = request.GetResponse()

    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())

    Dim sourcecode As String = sr.ReadToEnd()
    TextBox1.Text = sourcecode
End Sub

Maybe there's a way to auto select the "Continue" button programmatically?

3条回答
地球回转人心会变
2楼-- · 2019-01-17 19:15
Dim PictureURL As String = "http://www.bing.com" + New System.Net.WebClient().DownloadString("http://www.bing.com/HPImageArchive.aspx?format=rss&idx=0&n=1&mkt=de-DE").Replace("<link>", "|").Replace("</link>", "|").Split("|")(3)
查看更多
beautiful°
3楼-- · 2019-01-17 19:20

This guy answered it very well.

How can I get HTML page source for websites in VB.NET?

This was his code:

Dim sourceString As String = New System.Net.WebClient().DownloadString("SomeWebPage")
查看更多
Deceive 欺骗
4楼-- · 2019-01-17 19:26

I have tried writing something like this in the past and found out that there are bunch of limitations in place (either by browsers or by protocol itself) to prevent automation. Creating an universal website parser will be impossible. You would have to write parsing routines for individual sites, based on the way they hide content from you. You first have to determine pattern of how each of these sites hide the content from user and then implement the actual parsing for each pattern (patterns being either a ling with video destination, or a button that pops up another window with the content video, or a button that executes a javascript that dynamically loads a video into current window)

查看更多
登录 后发表回答