I'm using Delphi 2009 and want to decode an HTML encoded string, for example:
' -> '
But cannot find any built in function for doing this.
Thanks in advance
I'm using Delphi 2009 and want to decode an HTML encoded string, for example:
' -> '
But cannot find any built in function for doing this.
Thanks in advance
Look at the HTTPApp unit. HTTPDecode and HTMLDecode (as well as the Encode functions). You should find this in your Source/Win32/Internet folder.
Here's my HTMLDecode procedure (slightly modified from CGs HTTPApp unit):
function HTMLDecode(const AStr: String): String;
var
Sp, Rp, Cp, Tp: PChar;
S: String;
I, Code: Integer;
begin
SetLength(Result, Length(AStr));
Sp := PChar(AStr);
Rp := PChar(Result);
Cp := Sp;
try
while Sp^ <> #0 do
begin
case Sp^ of
'&': begin
Cp := Sp;
Inc(Sp);
case Sp^ of
'a': if AnsiStrPos(Sp, 'amp;') = Sp then { do not localize }
begin
Inc(Sp, 3);
Rp^ := '&';
end;
'l',
'g': if (AnsiStrPos(Sp, 'lt;') = Sp) or (AnsiStrPos(Sp, 'gt;') = Sp) then { do not localize }
begin
Cp := Sp;
Inc(Sp, 2);
while (Sp^ <> ';') and (Sp^ <> #0) do
Inc(Sp);
if Cp^ = 'l' then
Rp^ := '<'
else
Rp^ := '>';
end;
'n': if AnsiStrPos(Sp, 'nbsp;') = Sp then { do not localize }
begin
Inc(Sp, 4);
Rp^ := ' ';
end;
'q': if AnsiStrPos(Sp, 'quot;') = Sp then { do not localize }
begin
Inc(Sp,4);
Rp^ := '"';
end;
'#': begin
Tp := Sp;
Inc(Tp);
while (Sp^ <> ';') and (Sp^ <> #0) do
Inc(Sp);
SetString(S, Tp, Sp - Tp);
Val(S, I, Code);
Rp^ := Chr((I));
end;
else
Exit;
end;
end
else
Rp^ := Sp^;
end;
Inc(Rp);
Inc(Sp);
end;
except
end;
SetLength(Result, Rp - PChar(Result));
end;
The HttpApp.HttpDecode function doesn't decode http entities (https://www.w3.org/TR/html4/sgml/entities.html#sym)
For example: ∴ → ∴
function HtmlDecode(s: UnicodeString): UnicodeString;
{
Public domain: No attribution required
Known issue, it doesn't handle entities with characters code points above $FFFF (65536)
e.g.: &