C# Console/Server access to web site

2019-07-04 19:18发布

问题:

I am working on a C# project where I need to get data from a secured web site that does not have an API or web services. My plan is to login, get to the page I need, and parse out the HTML to get to the data bits I need to log to a database. Right now I'm testing with a console app, but eventually this will be converted to an Azure Service bus application.

In order to get to anything, you have to login at their login.cfm page, which means I need to load the username and password input controls on the page and click the submit button. Then navigate to the page I need to parse.

Since I don't have a 'browser' to parse for controls, I am trying to use various C# .NET classes to get to the page, set the username and password, and click submit, but nothing seems to work.

Any examples I can look at, or .NET classes I should be reviewing that were designed for this sort of project?

Thanks!

回答1:

Use the WebClient class in System.Net

For persistence of session cookie you'll have to make a custom WebClient class.

#region webclient with cookies
public class WebClientX : WebClient
{
    public CookieContainer cookies = new CookieContainer();
    protected override WebRequest GetWebRequest(Uri location)
    {
        WebRequest req = base.GetWebRequest(location);
        if (req is HttpWebRequest)
            (req as HttpWebRequest).CookieContainer = cookies;
        return req;
    }
    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse res = base.GetWebResponse(request);
        if (res is HttpWebResponse)
            cookies.Add((res as HttpWebResponse).Cookies);
        return res;
    }
}
#endregion

Use a browser add-on like FireBug or the development tools built into Chrome to get the HTTP POST data being sent when you submit a form. Send those POSTs using the WebClientX class and parse the response HTML.

The fastest way to parse HTML when you already know the format is using a simple Regex.Match. So you'd go through the actions in your browser using the development tools to record your POSTs, URLs and HTML content then you'll perform the same tasks using the WebClientX.



回答2:

Ok, so here is the complete Code to login to one page, then read from a 2nd page after the login.

    class Program
        {
            static void Main(string[] args)
            {

                string uriString = "http://www.remotesite.com/login.cfm";

                // Create a new WebClient instance.
                WebClientX myWebClient = new WebClientX();

                // Create a new NameValueCollection instance to hold some custom parameters to be posted to the URL.
                NameValueCollection myNameValueCollection = new NameValueCollection();

                // Add necessary parameter/value pairs to the name/value container.
                myNameValueCollection.Add("userid", "myname");
                myNameValueCollection.Add("mypassword", "mypassword");

                Console.WriteLine("\nUploading to {0} ...", uriString);
                // 'The Upload(String,NameValueCollection)' implicitly method sets HTTP POST as the request method.            
                byte[] responseArray = myWebClient.UploadValues(uriString, myNameValueCollection);

                // Decode and display the response.
                Console.WriteLine("\nResponse received was :\n{0}", Encoding.ASCII.GetString(responseArray));

                Console.WriteLine("\n\n\n pausing...");
                Console.ReadKey();

                // Go to 2nd page on the site to get additional data
                Stream myStream = myWebClient.OpenRead("https://www.remotesite.com/status_results.cfm?t=8&prog=d");

                Console.WriteLine("\nDisplaying Data :\n");
                StreamReader sr = new StreamReader(myStream);
                StringBuilder sb = new StringBuilder();

                using (StreamReader reader = new StreamReader(myStream, System.Text.Encoding.UTF8))
                {
                    string line;
                    while ((line = reader.ReadLine()) != null)
                    {
                        sb.Append(line + "\r\n");
                    }
                }

                using (StreamWriter outfile = new StreamWriter(@"Logfile1.txt"))
                {
                    outfile.Write(sb.ToString());
                }

                Console.WriteLine(sb.ToString());

                Console.WriteLine("\n\n\n pausing...");
                Console.ReadKey();

            }

        }

        public class WebClientX : WebClient
        {
            public CookieContainer cookies = new CookieContainer();
            protected override WebRequest GetWebRequest(Uri location)
            // public override WebRequest GetWebRequest(Uri location)
            {
                WebRequest req = base.GetWebRequest(location);
                if (req is HttpWebRequest)
                    (req as HttpWebRequest).CookieContainer = cookies;
                return req;
            }
            protected override WebResponse GetWebResponse(WebRequest request)
            {
                WebResponse res = base.GetWebResponse(request);
                if (res is HttpWebResponse)
                    cookies.Add((res as HttpWebResponse).Cookies);
                return res;
            }
        }