WebClient GodLikeClient = new WebClient();
HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument();
So this code returns: "Skaitytojo klausimas psichologui: kas lemia homoseksualumÄ…? - Naujienų portalas Alfa.lt" instead of "Skaitytojo klausimas psichologui: kas lemia homoseksualumą? - Naujienų portalas Alfa.lt".
This webpage is encoded in 1257 (baltic), but textBox1.Text = GodLikeHTML.DocumentNode.OuterHtml;
returns the distorted text - baltic diacritics are transformed into some weird several characters long strings :(
And yes, I've tried the HtmlAgilityPack forums. They do suck.
P.S. I'm no programmer, but I work on a community project and I really need to get this code working. Thanks ;}
I had a similar encoding problems. I fixed it, in the most current version of HtmlAgilityPack, by adding the following to my WebClient initialization.
hope it helps :)
This is my solution
if all of those post doesn't work, Just use this:
WebUtility.HtmlDecode("Your html text");
try to change that to
GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt"), Encoding.GetEncoding(1257));
UTF8 didn't work for me, but after setting the encoding like this, most pages i was trying to scrape worked just wel:
web.OverrideEncoding = Encoding.GetEncoding("ISO-8859-1");
Perhaps it might help someone.