I'm writing a program that gets the source code of a web page with a video on it. It then uses regular expressions to isolate the download link of that video. then it uses httpwebrequest
and httpwebresponse
to download the video. My problem arises when certain sites have a page where you have to click continue in order to get to the video page.
For example, there is a video playing on http://nextgenvidz.com/view/s995xvc9e2fv called "The.Matrix.Reloaded.2003.mp4" so I tell my program to get the source code for the url "http://nextgenvidz.com/view/s995xvc9e2fv" but it can't find the video's download link because it's searching for the file in the "continue" page's source code. If you go to that website above and view source, you won't see the link. Then, click continue and do the same when the video appears and you'll notice that the file is only there in the second one.
How can I get the source code for the page that the video is playing on, and not the page where I have to click continue?
I am trying to use this code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim Loading As String = "Loading..."
TextBox1.Text = Loading
Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create(TextBox2.Text)
Dim response As System.Net.HttpWebResponse = request.GetResponse()
Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())
Dim sourcecode As String = sr.ReadToEnd()
TextBox1.Text = sourcecode
End Sub
Maybe there's a way to auto select the "Continue" button programmatically?
This guy answered it very well.
How can I get HTML page source for websites in VB.NET?
This was his code:
I have tried writing something like this in the past and found out that there are bunch of limitations in place (either by browsers or by protocol itself) to prevent automation. Creating an universal website parser will be impossible. You would have to write parsing routines for individual sites, based on the way they hide content from you. You first have to determine pattern of how each of these sites hide the content from user and then implement the actual parsing for each pattern (patterns being either a ling with video destination, or a button that pops up another window with the content video, or a button that executes a javascript that dynamically loads a video into current window)