I am trying to use WebClient
to download a file from web using a WinForms application. However, I really only want to download HTML file. Any other type I will want to ignore.
I checked the WebResponse.ContentType
, but its value is always null
.
Anyone have any idea what could be the cause?
Given your update, you can do this by changing the .Method in GetWebRequest:
Alternatively, you can check the header when overriding GetWebRespons(), perhaps throwing an exception if it isn't what you wanted:
WebResponse is an abstract class and the ContentType property is defined in inheriting classes. For instance in the HttpWebRequest object this method is overloaded to provide the content-type header. I'm not sure what instance of WebResponse the WebClient is using. If you ONLY want HTML files, your best of using the HttpWebRequest object directly.
I'm not sure the cause, but perhaps you hadn't downloaded anything yet. This is the lazy way to get the content type of a remote file/page (I haven't checked if this is efficient on the wire. For all I know, it may download huge chunks of content)
Here is a method using TCP, which http is built on top of. It will return when connected or after the timeout (milliseconds), so the value may need to be changed depending on your situation
You could issue the first request with the HEAD verb, and check the content-type response header? [edit] It looks like you'll have to use HttpWebRequest for this, though.
Your question is a bit confusing: if you're using an instance of the Net.WebClient class, the Net.WebResponse doesn't enter into the equation (apart from the fact that it's indeed an abstract class, and you'd be using a concrete implementation such as HttpWebResponse, as pointed out in another response).
Anyway, when using WebClient, you can achieve what you want by doing something like this:
Note that you do have to check for the existence of the Content-Type header, as the server is not guaranteed to return it (although most modern HTTP servers will always include it). If no Content-Type header is present, you can fall back to another HTML detection method, for example opening the file, reading the first 1K characters or so into a string, and seeing if that contains the substring <html>
Also note that this is a bit wasteful, as you'll always transfer the full file, prior to deciding whether you want it or not. To work around that, switching to the Net.HttpWebRequest/Response classes might help, but whether the extra code is worth it depends on your application...