Please consider this language agnostic. I really like to know what happens beneath. The code is in C#. The problem is, I am trying to fetch this URL:
https://fr.wikipedia.org/wiki/Monast%C3%A8re_d%27Arkadi
from code.
Whatever I do, I get either "MaximumAutoRedirects exceeded" or "operation
timed out".
Sample code in C#, although I see similar results in other languages:
var url = "https://fr.wikipedia.org/wiki/Monast%C3%A8re_d%27Arkadi";
var request = (HttpWebRequest)WebRequest.Create(url);
try
{
request.CookieContainer = new CookieContainer();
request.MaximumAutomaticRedirections = Int32.MaxValue;
request.AllowAutoRedirect = true;
request.Method = "GET";
using (WebResponse response = request.GetResponse())
using (var stream = response.GetResponseStream())
{
var reader = new StreamReader(stream);
var res = reader.ReadToEnd();
}
}
catch (Exception ex)
{
}
My questions:
- Why this auto re-direct is happening?
- Is it possible to detect a get from code / browser? ( I don't know how to ask this. I meant that when accessing from Chrome I get the URL. But when performing a GET from code, I get infinite re-direct. So are those GETs any different was my question)
P.S: This question prompted me to ask this. Why Wikipedia returns 301 response code with certain URL?
Yes. An http server can send you an inifinite scheme. Usually the browser will detect an infinite redirection loop and will stop redirecting. A configuration problem on the server side can create a redirection loop. An HTTP client should be able to detect that and stop after a certain amount of redirections (and
request.MaximumAutomaticRedirections = Int32.MaxValue
is maybe too big in that case)Why is it happening? Well, mistakes. You can write rules in the HTTP server configuration with infinite redirections, the HTTP server is not a compiler, this will not be detected on the server side.
Or maybe you have a lot of different servers to manage and they do not all get the last configuration at the same time. In the wikipedia case it could be for example with someone fixing the redirections of a page. Say you have:
And you fix it to:
If any Reverse proxy cache as registered the
A -> B
redirection in a temporary cache, or if this is not immediatly available in all database replicas (as a wiki is storing a lot of rule in the application database), then... you could have bothA->B
andB->A
responses on the client side.Because you wrote:
Note that I'm not a C# expert but it seems legit.
Not sure what you are asking for.
EDIT Ok, so your question is how to track the HTTP traffic made by the browser when a simple GET is made, and maybe also how to track it from your code.
On the browser you have some native tools like the network tab in the development tools, where you can activate the
preserve log
button to track redirections, but for simple redirection I think you do not even need to preverve log to see the 302 responses followed by other requests.If you want to catch all HTTP traffic on your computer you can always use wireshark, which is not so hard. Or you can enforce all your HTTP traffic in your application or browser through a proxy like Fiddler.