I've been working on a WebCrawler written in C# using System.Windows.Forms.WebBrowser. I am trying to download a file off a website and save it on a local machine. More importantly, I would like this to be fully automated. The file download can be started by clicking a button that calls a javascript function that sparks the download displaying a “Do you want to open or save this file?” dialog. I definitely do not want to be manually clicking “Save as”, and typing in the file name.
I am aware of HttpWebRequest and WebClient’s download functions, but since the download is started with a javascript, I do now know the URL of the file. Fyi, the javascript is a doPostBack function that changes some values and submits a form.
I’ve tried getting focus on the save as dialog from WebBrowser to automate it from in there without much success. I know there’s a way to force the download to save instead of asking to save or open by adding a header to the http request, but I don’t know how to specify the filepath to download to.
I think you should prevent the download dialog from even showing. Here might be a way to do that:
The Javascript code causes your WebBrowser control to navigate to a specific Url (what would cause the download dialog to appear)
To prevent the WebBrowser control from actually Navigating to this Url, attach a event handler to the Navigating event.
In your Navigating event you'd have to analyze if this is the actual Navigation action you'd want to stop (is this one the download url, perhaps check for a file extension, there must be a recognizable format). Use the WebBrowserNavigatingEventArgs.Url to do so.
If this is the right Url, stop the Navigation by setting the WebBrowserNavigatingEventArgs.Cancel property.
Continue the download yourself with the HttpWebRequest or WebClient classes
Have a look at this page for more info on the event:
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.navigating.aspx
A similar solution is available at
http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/d338a2c8-96df-4cb0-b8be-c5fbdd7c9202/?prof=required
This work perfectly if there is direct URL including downloading file-name.
But sometime some URL generate file dynamically. So URL don't have file name but after requesting that URL some website create file dynamically and then open/save dialog comes.
for example some link generate pdf file on the fly.
How to handle such type of URL?
Take a look at Erika Chinchio article on http://www.codeproject.com/Tips/659004/Download-of-file-with-open-save-dialog-box
I have successfully used it for downloading dynamically generated pdf urls.
Assuming the System.Windows.Forms.WebBrowswer was used to access a protected page with a protected link that you want to download:
This code retrieves the actual link you want to download using the web browser. This code will need to be changed for your specific action. The important part is this a field documentLinkUrl
that will be used below.
var documentLinkUrl = default(Uri);
browser.DocumentCompleted += (object sender, WebBrowserDocumentCompletedEventArgs e) =>
{
var aspForm = browser.Document.Forms[0];
var downloadLink = browser.Document.ActiveElement
.GetElementsByTagName("a").OfType<HtmlElement>()
.Where(atag =>
atag.GetAttribute("href").Contains("DownloadAttachment.aspx"))
.First();
var documentLinkString = downloadLink.GetAttribute("href");
documentLinkUrl = new Uri(documentLinkString);
}
browser.Navigate(yourProtectedPage);
Now that the protected page has been navigated to by the web browser and the download link has been acquired, This code downloads the link.
private static async Task DownloadLinkAsync(Uri documentLinkUrl)
{
var cookieString = GetGlobalCookies(documentLinkUrl.AbsoluteUri);
var cookieContainer = new CookieContainer();
using (var handler = new HttpClientHandler() { CookieContainer = cookieContainer })
using (var client = new HttpClient(handler) { BaseAddress = documentLinkUrl })
{
cookieContainer.SetCookies(this.documentLinkUrl, cookieString);
var response = await client.GetAsync(documentLinkUrl);
if (response.IsSuccessStatusCode)
{
var responseAsString = await response.Content.ReadAsStreamAsync();
// Response can be saved from Stream
}
}
}
The code above relies on the GetGlobalCookies
method from Erika Chinchio which can be found in the excellent article provided by @Pedro Leonardo (available here),
[System.Runtime.InteropServices.DllImport("wininet.dll", CharSet = System.Runtime.InteropServices.CharSet.Auto, SetLastError = true)]
static extern bool InternetGetCookieEx(string pchURL, string pchCookieName,
System.Text.StringBuilder pchCookieData, ref uint pcchCookieData, int dwFlags, IntPtr lpReserved);
const int INTERNET_COOKIE_HTTPONLY = 0x00002000;
private string GetGlobalCookies(string uri)
{
uint uiDataSize = 2048;
var sbCookieData = new System.Text.StringBuilder((int)uiDataSize);
if (InternetGetCookieEx(uri, null, sbCookieData, ref uiDataSize,
INTERNET_COOKIE_HTTPONLY, IntPtr.Zero)
&&
sbCookieData.Length > 0)
{
return sbCookieData.ToString().Replace(";", ",");
}
return null;
}