How to check if System.Net.WebClient.DownloadData

2019-01-06 11:16发布

I am trying to use WebClient to download a file from web using a WinForms application. However, I really only want to download HTML file. Any other type I will want to ignore.

I checked the WebResponse.ContentType, but its value is always null.

Anyone have any idea what could be the cause?

7条回答
Anthone
2楼-- · 2019-01-06 11:40

Given your update, you can do this by changing the .Method in GetWebRequest:

using System;
using System.Net;
static class Program
{
    static void Main()
    {
        using (MyClient client = new MyClient())
        {
            client.HeadOnly = true;
            string uri = "http://www.google.com";
            byte[] body = client.DownloadData(uri); // note should be 0-length
            string type = client.ResponseHeaders["content-type"];
            client.HeadOnly = false;
            // check 'tis not binary... we'll use text/, but could
            // check for text/html
            if (type.StartsWith(@"text/"))
            {
                string text = client.DownloadString(uri);
                Console.WriteLine(text);
            }
        }
    }

}

class MyClient : WebClient
{
    public bool HeadOnly { get; set; }
    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest req = base.GetWebRequest(address);
        if (HeadOnly && req.Method == "GET")
        {
            req.Method = "HEAD";
        }
        return req;
    }
}

Alternatively, you can check the header when overriding GetWebRespons(), perhaps throwing an exception if it isn't what you wanted:

protected override WebResponse GetWebResponse(WebRequest request)
{
    WebResponse resp = base.GetWebResponse(request);
    string type = resp.Headers["content-type"];
    // do something with type
    return resp;
}
查看更多
干净又极端
3楼-- · 2019-01-06 11:42

WebResponse is an abstract class and the ContentType property is defined in inheriting classes. For instance in the HttpWebRequest object this method is overloaded to provide the content-type header. I'm not sure what instance of WebResponse the WebClient is using. If you ONLY want HTML files, your best of using the HttpWebRequest object directly.

查看更多
淡お忘
4楼-- · 2019-01-06 11:46

I'm not sure the cause, but perhaps you hadn't downloaded anything yet. This is the lazy way to get the content type of a remote file/page (I haven't checked if this is efficient on the wire. For all I know, it may download huge chunks of content)

        Stream connection = new MemoryStream(""); // Just a placeholder
        WebClient wc = new WebClient();
        string contentType;
        try
        {
            connection = wc.OpenRead(current.Url);
            contentType = wc.ResponseHeaders["content-type"];
        }
        catch (Exception)
        {
            // 404 or what have you
        }
        finally
        {
            connection.Close();
        }
查看更多
戒情不戒烟
5楼-- · 2019-01-06 11:47

Here is a method using TCP, which http is built on top of. It will return when connected or after the timeout (milliseconds), so the value may need to be changed depending on your situation

var result = false;
try {
    using (var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)) {
        var asyncResult = socket.BeginConnect(yourUri.AbsoluteUri, 80, null, null);
        result = asyncResult.AsyncWaitHandle.WaitOne(100, true);
        socket.Close();
    }
}
catch { }
return result;
查看更多
相关推荐>>
6楼-- · 2019-01-06 11:56

You could issue the first request with the HEAD verb, and check the content-type response header? [edit] It looks like you'll have to use HttpWebRequest for this, though.

查看更多
迷人小祖宗
7楼-- · 2019-01-06 11:56

Your question is a bit confusing: if you're using an instance of the Net.WebClient class, the Net.WebResponse doesn't enter into the equation (apart from the fact that it's indeed an abstract class, and you'd be using a concrete implementation such as HttpWebResponse, as pointed out in another response).

Anyway, when using WebClient, you can achieve what you want by doing something like this:

Dim wc As New Net.WebClient()
Dim LocalFile As String = IO.Path.Combine(Environment.GetEnvironmentVariable("TEMP"), Guid.NewGuid.ToString)
wc.DownloadFile("http://example.com/somefile", LocalFile)
If Not wc.ResponseHeaders("Content-Type") Is Nothing AndAlso wc.ResponseHeaders("Content-Type") <> "text/html" Then
    IO.File.Delete(LocalFile)
Else
    '//Process the file
End If

Note that you do have to check for the existence of the Content-Type header, as the server is not guaranteed to return it (although most modern HTTP servers will always include it). If no Content-Type header is present, you can fall back to another HTML detection method, for example opening the file, reading the first 1K characters or so into a string, and seeing if that contains the substring <html>

Also note that this is a bit wasteful, as you'll always transfer the full file, prior to deciding whether you want it or not. To work around that, switching to the Net.HttpWebRequest/Response classes might help, but whether the extra code is worth it depends on your application...

查看更多
登录 后发表回答