我工作的一个链接检查,一般我可以执行HEAD
请求,然而,一些网站似乎禁用这个动词,所以失败我还需要执行一个GET
请求(仔细检查链接是真的死了)
我用下面的代码作为我的链接测试仪:
public class ValidateResult
{
public HttpStatusCode? StatusCode { get; set; }
public Uri RedirectResult { get; set; }
public WebExceptionStatus? WebExceptionStatus { get; set; }
}
public ValidateResult Validate(Uri uri, bool useHeadMethod = true,
bool enableKeepAlive = false, int timeoutSeconds = 30)
{
ValidateResult result = new ValidateResult();
HttpWebRequest request = WebRequest.Create(uri) as HttpWebRequest;
if (useHeadMethod)
{
request.Method = "HEAD";
}
else
{
request.Method = "GET";
}
// always compress, if you get back a 404 from a HEAD it can be quite big.
request.AutomaticDecompression = DecompressionMethods.GZip;
request.AllowAutoRedirect = false;
request.UserAgent = UserAgentString;
request.Timeout = timeoutSeconds * 1000;
request.KeepAlive = enableKeepAlive;
HttpWebResponse response = null;
try
{
response = request.GetResponse() as HttpWebResponse;
result.StatusCode = response.StatusCode;
if (response.StatusCode == HttpStatusCode.Redirect ||
response.StatusCode == HttpStatusCode.MovedPermanently ||
response.StatusCode == HttpStatusCode.SeeOther)
{
try
{
Uri targetUri = new Uri(Uri, response.Headers["Location"]);
var scheme = targetUri.Scheme.ToLower();
if (scheme == "http" || scheme == "https")
{
result.RedirectResult = targetUri;
}
else
{
// this little gem was born out of http://tinyurl.com/18r
// redirecting to about:blank
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = null;
}
}
catch (UriFormatException)
{
// another gem... people sometimes redirect to http://nonsense:port/yay
result.StatusCode = HttpStatusCode.SwitchingProtocols;
result.WebExceptionStatus = WebExceptionStatus.NameResolutionFailure;
}
}
}
catch (WebException ex)
{
result.WebExceptionStatus = ex.Status;
response = ex.Response as HttpWebResponse;
if (response != null)
{
result.StatusCode = response.StatusCode;
}
}
finally
{
if (response != null)
{
response.Close();
}
}
return result;
}
这一切工作正常,很正常。 除了当我执行GET
请求,整个有效载荷被下载(我在Wireshark中看到这个)。
有什么办法来配置底层ServicePoint
或HttpWebRequest
没有缓冲或急于负载响应主体呢?
(如果我是手工编写这一点,我将设置TCP接收窗口非常低,然后只抢到足够的数据包以获得接头,只要我有足够的信息停止ACKING TCP数据包。)
对于那些想知道这意味着实现,我不想下载一个40K的404,当我得到一个404,这样做了几十万次在网络上昂贵