Get HTML from Frame using WebBrowser control - una

2019-05-31 20:14发布

问题:

I'm looking for a free tool or dlls that I can use to write my own code in .NET to process some web requests. Let's say I have a URL with some query string parameters similar to http://www.example.com?param=1 and when I use it in a browser several redirects occur and eventually HTML is rendered that has a frameset and a frame's inner html contains a table with data that I need. I want to store this data in the external file in a CSV format. Obviously the data is different depending on the querystring parameter param. Let's say I want to run the application and generate 1000 CSV files for param values from 1 to 1000.

I have good knowledge in .NET, javascript, HTML, but the main problem is how to get the final HTML in the server code.

What I tried is I created a new Form Application, added a webbrowser control and used code like this:

private void FormMain_Shown(object sender, EventArgs e)
    {
        var param = 1; //test
        var url = string.Format(Constants.URL_PATTERN, param);

        WebBrowserMain.Navigated += WebBrowserMain_Navigated;
        WebBrowserMain.Navigate(url);
    }

    void WebBrowserMain_Navigated(object sender, WebBrowserNavigatedEventArgs e)
    {
        if (e.Url.OriginalString == Constants.FINAL_URL)
        {
            var document = WebBrowserMain.Document.Window.Frames[0].Document;
        }
    }

But unfortunately I receieve unauthorizedaccessexception because probably frame and the document are in different domains. Does anybody has an idea of how to work around this and maybe another brand new approach to implement functionality like this?

回答1:

Thanks to the Noseratio's comments I managed to do that with the WebBrowser control. Here are some major points that might help others who have similar questions:

1) DocumentCompleted event should be used. For Navigated event body of the document is NULL.

2) Following answer helped a lot: WebBrowserControl: UnauthorizedAccessException when accessing property of a Frame

3) I was not aware about IHTMLWindow2 similar interfaces, for them to work correctly I added references to following COM libs: Microsoft Internet Controls (SHDocVw), Microsoft HTML Object Library (MSHTML).

4) I grabbed the html of the frame with the following code:

    void WebBrowserMain_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
        if (e.Url.OriginalString == Constants.FINAL_URL)
        {
            try
            {
                var doc = (IHTMLDocument2) WebBrowserMain.Document.DomDocument;
                var frame = (IHTMLWindow2) doc.frames.item(0);
                var document = CrossFrameIE.GetDocumentFromWindow(frame);
                var html = document.body.outerHTML;

                var dataParser = new DataParser(html);
                //my logic here
            }

5) For the work with Html, I used the fine HTML Agility Pack that has some pretty good XPath search.