I'm using below script to retrieve HTML from an URL.
string webURL = @"https://nl.wiktionary.org/wiki/" + word.ToLower();
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString(webURL);
}
The variable word can be any word. In case there is no WIKI page for the "word" be retrieved the code is ending in error with code 404, while retrievng the URL with a browser opens a WIKI page, saying there is no page for this item yet.
What I want is that the code always gets the HTML, also when the WIKI page says there is no info yet. I do not want to avoid the error 404 with a try and catch.
Does anyone has an idea why this is not working with a Webclient?
try this. You can catch the 404 error content in a try catch block.
var word = Console.ReadLine();
string webURL = @"https://nl.wiktionary.org/wiki/" + word.ToLower();
using (WebClient client = new WebClient() { })
{
try
{
string htmlCode = client.DownloadString(webURL);
}
catch (WebException exception)
{
string responseText=string.Empty;
var responseStream = exception.Response?.GetResponseStream();
if (responseStream != null)
{
using (var reader = new StreamReader(responseStream))
{
responseText = reader.ReadToEnd();
}
}
Console.WriteLine(responseText);
}
}
Console.ReadLine();
Since this WIKI-server use case-sensitive url mapping, just don't modify case of URL to harvest (remove ".ToLower()" from you code).
Ex.:
Lower case:
https://nl.wiktionary.org/wiki/categorie:onderwerpen_in_het_nynorsk
Result: HTTP 404(Not Found)
Normal (unmodified) case:
https://nl.wiktionary.org/wiki/Categorie:Onderwerpen_in_het_Nynorsk
Result: HTTP 200(OK)
Also, keep in mind what most (if not all) WiKi servers (including this one) generates custom 404 pages, so in browser they looks like "normal" pages, but, despite this, they are serving with 404 http code.