How to get Web Session from cookie?

2019-03-04 06:40发布

问题:

I'm trying to do an scrape a web page but in order to Post the data I need a web session ID like

web_session=HQJ3G1GPAAHRZGFR

How can I get that ID?

My code so far is:

Private Sub test()

    Dim postData As String = "web_session=HQJ3G1GPAAHRZGFR&intext=O&term_code=201210&search_type=A&keyword=&kw_scope=all&kw_opt=all&subj_code=BIO&crse_numb=205&campus=*&instructor=*&instr_session=*&attr_type=*&mon=on&tue=on&wed=on&thu=on&fri=on&sat=on&sun=on&avail_flag=on" '/BANPROD/pkgyc_yccsweb.P_Results 
    Dim tempCookie As New CookieContainer
    Dim encoding As New UTF8Encoding
    Dim byteData As Byte() = encoding.GetBytes(postData)

    System.Net.ServicePointManager.SecurityProtocol = Net.SecurityProtocolType.Ssl3
    Try

        tempCookie.GetCookies(New Uri("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results"))
        'postData="web_session=" & tempCookie.

        Dim postReq As HttpWebRequest = DirectCast(WebRequest.Create("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results"), HttpWebRequest)
        postReq.Method = "POST"
        postReq.KeepAlive = True
        postReq.CookieContainer = tempCookie
        postReq.ContentType = "application/x-www-form-urlencoded"


        postReq.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.0.3705; Media Center PC 4.0; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
        postReq.ContentLength = byteData.Length
        Dim postreqstream As Stream = postReq.GetRequestStream
        postreqstream.Write(byteData, 0, byteData.Length)
        postreqstream.Close()
        Dim postresponse As HttpWebResponse
        postresponse = DirectCast(postReq.GetResponse, HttpWebResponse)
        tempCookie.Add(postresponse.Cookies)

        Dim postresreader As New StreamReader(postresponse.GetResponseStream)
        Dim thepage As String = postresreader.ReadToEnd
        MsgBox(thepage)
    Catch ex As WebException
        MsgBox(ex.Status.ToString & vbNewLine & ex.Message.ToString)
    End Try

End Sub

回答1:

The problem is that tempCookie.GetCookies() isn't doing what you think its doing. What it actually does is essentially filter a pre-existing CookieCollection down to only include cookies for the supplied URL. Instead, what you need to do is first create a request to a page that will give you this session token, then make the actual request for your data. So first request the page at P_Search, then re-use that request with the CookieContainer bound to it and post to P_Results.

Instead of the HttpWebRequest object, however, let me point you to the WebClient class and my post here about extending it to support cookies. You'll find that you can simplify your code a lot. Below is a full working VB2010 WinForms app that shows this. If you still want to use the HttpWebRequest object this should at least give you an idea of what needs to be done, too:

Option Strict On
Option Explicit On

Imports System.Net

Public Class Form1

    Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
        ''//Create our webclient
        Using WC As New CookieAwareWebClient()
            ''//Set SSLv3
            System.Net.ServicePointManager.SecurityProtocol = Net.SecurityProtocolType.Ssl3
            ''//Create a session, ignore what is returned
            WC.DownloadString("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Search")
            ''//POST our actual data and get the results
            Dim S = WC.UploadString("https://taylor.yc.edu/BANPROD/pkgyc_yccsweb.P_Results", "POST", "term_code=201130&search_type=K&keyword=math")
            Trace.WriteLine(S)
        End Using
    End Sub
End Class

Public Class CookieAwareWebClient
    Inherits WebClient

    Private cc As New CookieContainer()
    Private lastPage As String

    Protected Overrides Function GetWebRequest(ByVal address As System.Uri) As System.Net.WebRequest
        Dim R = MyBase.GetWebRequest(address)
        If TypeOf R Is HttpWebRequest Then
            With DirectCast(R, HttpWebRequest)
                .CookieContainer = cc
                If Not lastPage Is Nothing Then
                    .Referer = lastPage
                End If
            End With
        End If
        lastPage = address.ToString()
        Return R
    End Function
End Class