Feed burner changed their blog service return results that it returns blocks of javascript similar to:
document.write("\x3cdiv class\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); document.write("\x3cli\x3e\x3cspan class\x3d\x22headline\x22\x3e\x3ca href\x3d\x22
I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.
Edit: Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer
public static char ConvertHexToASCII(this string hex)
{
if (hex == null) throw new ArgumentNullException(hex);
return (char)Convert.ToByte(hex, 16);
}
.
private string DecodeFeedburnerHtml(string html)
{
var builder = new StringBuilder(html.Length);
var stack = new Stack<char>(4);
foreach (var chr in html)
{
switch (chr)
{
case '\\':
if (stack.Count == 0)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
case 'x':
if (stack.Count == 1)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
default:
if (stack.Count >= 2)
{
stack.Push(chr);
if (stack.Count == 4)
{
//get stack[3]stack[4]
string hexString = string.Format("{1}{0}", stack.Pop(),
stack.Pop());
builder.Append(hexString.ConvertHexToASCII());
stack.Clear();
}
}
else
{
builder.Append(chr);
}
break;
}
}
html = builder.ToString();
return html;
}
Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.
That is a PHP Twig encoding:
http://www.twig-project.org/
Since you are using C# you will most likely have to create a dictionary to translate the symbols and then use a series of
.Replace()
string methods to convert those back to HTML characters.Alternatively you can save that data to a file, run a Perl script to decode the text and then read from the file in C#, but that might be more costly.
In dotnet core you can use Uri.UnescapeDataString(originalString.Replace("\x","%")) to convert it by making it into a Url encoded string first.
Those look like ASCII values, encoded in hex. You could traverse the string, and whenever you find a
\x
followed by two hexadecimal digits (0-9,a-f), replace it with the corresponding ASCII character. If the string is long, it would be faster to save the result incrementally to aStringBuilder
instead of usingString.Replace()
.I don't know the encoding specification, but there might be more rules to follow (for example, if
\\
is an escape character for a literal\
).