WebBrowser Threads don't seem to be closing

2019-05-25 06:58发布

问题:

I am using WebBrowser to render javascript on webpages to scrape the rendered source code, but after several page loads, the CPU usage spikes to 100% as well as the number of threads.

I'm assuming that the threads are not closing properly once the webpage has been rendered. I am trying to open the browser, extract the source code, and then close the browser and move to the next page.

I am able to get the rendered page, but this program doesn't make it very far before getting bogged down. I tried adding wb.Stop() but that didn't help. The memory doesn't seem to be the problem (stays at a constant 70% or so).

Here is my source code. using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.Threading;

namespace Abot.Demo
{
    // Threaded version
    public class HeadlessBrowser
    {
        private static string GeneratedSource { get; set; }
        private static string URL { get; set; }

        public static string GetGeneratedHTML(string url)
        {
            URL = url;

            Thread t = new Thread(new ThreadStart(WebBrowserThread));
            t.SetApartmentState(ApartmentState.STA);
            t.Start();
            t.Join();

            return GeneratedSource;
        }

        private static void WebBrowserThread()
        {
            WebBrowser wb = new WebBrowser();
            wb.Navigate(URL);

            wb.DocumentCompleted +=
                new WebBrowserDocumentCompletedEventHandler(
                    wb_DocumentCompleted);

            while (wb.ReadyState != WebBrowserReadyState.Complete);
                //Application.DoEvents();

            //Added this line, because the final HTML takes a while to show up
            GeneratedSource = wb.Document.Body.InnerHtml;

            wb.Dispose();
            wb.Stop();
        }

        private static void wb_DocumentCompleted(object sender,
            WebBrowserDocumentCompletedEventArgs e)
        {
            WebBrowser wb = (WebBrowser)sender;
            GeneratedSource = wb.Document.Body.InnerHtml;
        }

    }
}

Any suggestions would be appreciated.

Thanks.

回答1:

WebBrowser is specifically designed to be used from inside a windows forms project. It is not designed to be used from outside a windows forms project.

Among other things, it is specifically designed to use an application loop, which would exist in pretty much any desktop GUI application. You don't have this, and this is of course causing problems for you because the browser leverages this for its event based style of programming.

A quick word to any future readers who happen to be reading this and which are actually creating a winforms, WPF, or other application that already has a message loop. Do not apply the following code. You should only ever have one message loop in your application. Creating several is setting yourself up for a nightmare.

Since you have no application loop you need to create a new application loop, specify some code to run within that application loop, allow it to pump messages, and then tear it down when you have gotten your result.

public static string GetGeneratedHTML(string url)
{
    string result = null;
    ThreadStart pumpMessages = () =>
    {
        EventHandler idleHandler = null;
        idleHandler = (s, e) =>
        {
            Application.Idle -= idleHandler;

            WebBrowser wb = new WebBrowser();
            wb.DocumentCompleted += (s2, e2) =>
            {
                result = wb.Document.Body.InnerHtml;
                wb.Dispose();
                Application.Exit();
            };
            wb.Navigate(url);
        };
        Application.Idle += idleHandler;
        Application.Run();
    };
    if (Thread.CurrentThread.GetApartmentState() == ApartmentState.STA)
        pumpMessages();
    else
    {
        Thread t = new Thread(pumpMessages);
        t.SetApartmentState(ApartmentState.STA);
        t.Start();
        t.Join();
    }
    return result;
}