HTTPWebResponse Raw Response, using Reflection

2019-02-22 08:45发布

问题:

HTTPWebResponse exposes properties for Headers, is it possible to get raw response like the once we get using socket, Header and Content combined using Reflection, I think there must be a way.

I can use socket but a lot of work needed to make them usable, like proxy support, https, progress events, etc... list is long, i have been strongly advised to use HTTPWebRequest, only problem is i need the raw headers with the response, the websites i am trying to download sends a very long and weird cookie, which is not handled by HTTPWebRequest, WebClient. Wordpress blogs, not able to login any wordpress blog using WebClient, but with Sockets manual cookie handling it works perfect, may be a bug in WebClient.

1) Just need the raw headers, that will do the trick.

2) And also article link

The article says HTTPWebRequest there is a problem, only one thread is downloading while others are kept waiting, if this is true then sockets are better??

The article says: This code works well but it has a very serious problem as the WebRequest class function GetResponse locks the access to all other processes, the WebRequest tells the retrieved response as closed, as in the last line in the previous code. So I noticed that always only one thread is downloading while others are waiting to GetResponse. To solve this serious problem, I implemented my two classes, MyWebRequest and MyWebResponse using Socket.

回答1:

There is a way to get raw headers:

var rawHeaders = request.GetResponse().Headers.ToString();

With your website and request you provided it returned:

Pragma: no-cache
X-Frame-Options: SAMEORIGIN
Cache-Control: no-cache, must-revalidate, max-age=0
Date: Wed, 03 Aug 2011 12:08:49 GMT
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Set-Cookie: wordpress_test_cookie=WP+Cookie+check;     path=/,wordpress_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/wp-admin,wordpress_sec_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/wp-admin,wordpress_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/wp-content/plugins,wordpress_sec_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/wp-content/plugins,wordpress_logged_in_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpress_logged_in_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpress_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpress_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpress_sec_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpress_sec_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpressuser_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpresspass_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpressuser_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/,wordpresspass_c2d1208bd3bc2294298da94d67693495=+; expires=Tue, 03-Aug-2010 12:08:49 GMT; path=/
Server: Apache
X-Powered-By: PHP/5.2.17
Last-Modified: Wed, 03 Aug 2011 12:08:49 GMT
Content-Type: text/html; charset=UTF-8
X-Cache: MISS from localhost
X-Cache-Lookup: MISS from localhost:3128
Via: 1.0 localhost (squid/3.1.6)
Connection: close

Does this solve your problem?

About Sockets instead of WebRequests - I would recommend against that approach. It is reinventing the wheel.

UPDATE

This does not solve the problem, as above headers are already parsed in a lossy way (see comments for details). Upon closer inspection I came to the conclusion, that raw header bytes are already lost after HttpWebRequest.GetResponse().

The core parsing is done in System.Net.WebHeaderCollection.ParseHeaders() or System.Net.WebHeaderCollection.ParseHeadersStrict() (depending on the value of System.Net.Configuration.SettingsSectionInternal.Section.UseUnsafeHeaderParsing) and both methods fail to record the required information. Soon after, the buffer they operate on (System.Net.Connection.m_ReadBuffer) is filled with new data from the wire. The original headers are lost.

In order to save the raw data, you would need to reimplement the System.Net.Connection class, which is internal and hard-referenced by ServicePoint, which is public, but still hard-referenced by HttpWebRequest. To sum up, you would have to reimplement the whole stack.

So unless you can change the website behavior or live without these cookies, you will need to use a Socket. If that's the case, I would like to offer my condolences.