I am trying to find the index of Mauricio in a string that is downloaded from a website using webclient and download string. However, on the website it contains a foreign character, Maurício. So I found elsewhere some code
string ToASCII(string s)
{
return String.Join("",
s.Normalize(NormalizationForm.FormD)
.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
}
that converts foreign characters. I have tested the code and it works. So the problem I have is that when I download the string, it downloads as MaurA-cio. I have tried both
wc.Encoding = System.Text.Encoding.UTF8;
wc.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
Neither stop it from downloading as MaurA-cio.
(Also, I cannot change the search as I am getting the search term from a list).
What else can I try? Thanks
this one will work for any character
DownloadString doesn't look at HTTP response headers. It uses the previously set WebClient.Encoding property. If you have to use it, get the headers first:
BTW--Unicode doesn't define "foreign" characters. From Maurício's perspective, "Mauricio" would be the foreign spelling of his name.