While I can successfully encode and decode the user data part of an SMS message when a UDH is not present, I'm having trouble doing so when a UDH is present (in this case, for concatenated SMS).
When I decode or encode the user data, do I need to prepend the UDH to the text before doing so?
This article provides an encoding routine sample that compensates for the UDH with padding bits (which I still don't completely understand) but it doesn't give an example of data being passed to the routine so I don't have a clear use case (and I could not find a decoding sample on the site): http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/.
So far, I have been able to get some results if I prepend the UDH to the user data before decoding it, but I suspect this is just a coincidence.
As an example (using values from https://en.wikipedia.org/wiki/Concatenated_SMS):
UDH := '050003000302';
ENCODED_USER_DATA_PART := 'D06536FB0DBABFE56C32'; // with padding, evidently
DecodedUserData := Decode7Bit(UDH + ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);
Output: "ß@ø¿Æ @hello world"
EncodedUserData := Encode7Bit(DecodedUserData);
DecodedUserData := Decode7Bit(EncodedEncodedUserData);
Writeln(DecodedUserData);
Same Output: "ß@ø¿Æ @hello world"
Without prepending the UDH I get garbage:
DecodedUserData := Decode7Bit(ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);
Output: "PKYY§An§eYI"
What is correct way of handling this?
Am I supposed to include the UDH with the text when encoding the user data?
Am I supposed to strip off the garbage characters after decoding, or am I (as I suspect) completely off base with this assumption?
While the decoding algorithm here seems to work without a UDH it doesn't seem to take any UDH information into account: Looking for GSM 7bit encode/decode algorithm.
I would be eternally grateful if someone could set me straight on the correct way to proceed. Any clear examples/code samples would be very much appreciated. ;-)
I will also provide a small sample application that includes the algorithms if anyone feels it will help solve the riddle.
EDIT 1:
I'm using Delphi XE2 Update 4 Hotfix 1
EDIT 2:
Thanks to help from @whosrdaddy, I was able to successfully get my encoding/decoding routines to work.
As a side note, I was curious as to why the user data needed to be on a 7-bit boundary when the UDH wasn't encoded with it, but the last sentence in the paragraph from the ETSI specification quoted by @whosrdaddy answered that:
If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header. This is to ensure that the SM itself starts on an octet boundary so that an earlier phase mobile will be capable of displaying the SM itself although the TP-UD Header in the TP-UD field may not be understood
My code is based in part on examples from the following resources:
Looking for GSM 7bit encode/decode algorithm
https://en.wikipedia.org/wiki/Concatenated_SMS
http://mobiletidings.com/2009/02/18/combining-sms-messages/
http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/
http://mobileforensics.files.wordpress.com/2007/06/understanding_sms.pdf
http://www.dreamfabric.com/sms/
http://www.mediaburst.co.uk/blog/concatenated-sms/
Here's the code for anyone else who's had trouble with SMS encoding/decoding. I'm sure it can be simplified/optimized (and comments are welcome), but I've tested it with several different permutations and UDH header lengths with success. I hope it helps.
unit SmsUtils;
interface
uses Windows, Classes, Math;
function Encode7Bit(const AText: string; AUdhLen: Byte;
out ATextLen: Byte): string;
function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
implementation
var
g7BitToAsciiTable: array [0 .. 127] of Byte;
gAsciiTo7BitTable: array [0 .. 255] of Byte;
procedure InitializeTables;
var
AsciiValue: Integer;
i: Integer;
begin
// create 7-bit to ascii table
g7BitToAsciiTable[0] := 64; // @
g7BitToAsciiTable[1] := 163;
g7BitToAsciiTable[2] := 36;
g7BitToAsciiTable[3] := 165;
g7BitToAsciiTable[4] := 232;
g7BitToAsciiTable[5] := 223;
g7BitToAsciiTable[6] := 249;
g7BitToAsciiTable[7] := 236;
g7BitToAsciiTable[8] := 242;
g7BitToAsciiTable[9] := 199;
g7BitToAsciiTable[10] := 10;
g7BitToAsciiTable[11] := 216;
g7BitToAsciiTable[12] := 248;
g7BitToAsciiTable[13] := 13;
g7BitToAsciiTable[14] := 197;
g7BitToAsciiTable[15] := 229;
g7BitToAsciiTable[16] := 0;
g7BitToAsciiTable[17] := 95;
g7BitToAsciiTable[18] := 0;
g7BitToAsciiTable[19] := 0;
g7BitToAsciiTable[20] := 0;
g7BitToAsciiTable[21] := 0;
g7BitToAsciiTable[22] := 0;
g7BitToAsciiTable[23] := 0;
g7BitToAsciiTable[24] := 0;
g7BitToAsciiTable[25] := 0;
g7BitToAsciiTable[26] := 0;
g7BitToAsciiTable[27] := 0;
g7BitToAsciiTable[28] := 198;
g7BitToAsciiTable[29] := 230;
g7BitToAsciiTable[30] := 223;
g7BitToAsciiTable[31] := 201;
g7BitToAsciiTable[32] := 32;
g7BitToAsciiTable[33] := 33;
g7BitToAsciiTable[34] := 34;
g7BitToAsciiTable[35] := 35;
g7BitToAsciiTable[36] := 164;
g7BitToAsciiTable[37] := 37;
g7BitToAsciiTable[38] := 38;
g7BitToAsciiTable[39] := 39;
g7BitToAsciiTable[40] := 40;
g7BitToAsciiTable[41] := 41;
g7BitToAsciiTable[42] := 42;
g7BitToAsciiTable[43] := 43;
g7BitToAsciiTable[44] := 44;
g7BitToAsciiTable[45] := 45;
g7BitToAsciiTable[46] := 46;
g7BitToAsciiTable[47] := 47;
g7BitToAsciiTable[48] := 48;
g7BitToAsciiTable[49] := 49;
g7BitToAsciiTable[50] := 50;
g7BitToAsciiTable[51] := 51;
g7BitToAsciiTable[52] := 52;
g7BitToAsciiTable[53] := 53;
g7BitToAsciiTable[54] := 54;
g7BitToAsciiTable[55] := 55;
g7BitToAsciiTable[56] := 56;
g7BitToAsciiTable[57] := 57;
g7BitToAsciiTable[58] := 58;
g7BitToAsciiTable[59] := 59;
g7BitToAsciiTable[60] := 60;
g7BitToAsciiTable[61] := 61;
g7BitToAsciiTable[62] := 62;
g7BitToAsciiTable[63] := 63;
g7BitToAsciiTable[64] := 161;
g7BitToAsciiTable[65] := 65;
g7BitToAsciiTable[66] := 66;
g7BitToAsciiTable[67] := 67;
g7BitToAsciiTable[68] := 68;
g7BitToAsciiTable[69] := 69;
g7BitToAsciiTable[70] := 70;
g7BitToAsciiTable[71] := 71;
g7BitToAsciiTable[72] := 72;
g7BitToAsciiTable[73] := 73;
g7BitToAsciiTable[74] := 74;
g7BitToAsciiTable[75] := 75;
g7BitToAsciiTable[76] := 76;
g7BitToAsciiTable[77] := 77;
g7BitToAsciiTable[78] := 78;
g7BitToAsciiTable[79] := 79;
g7BitToAsciiTable[80] := 80;
g7BitToAsciiTable[81] := 81;
g7BitToAsciiTable[82] := 82;
g7BitToAsciiTable[83] := 83;
g7BitToAsciiTable[84] := 84;
g7BitToAsciiTable[85] := 85;
g7BitToAsciiTable[86] := 86;
g7BitToAsciiTable[87] := 87;
g7BitToAsciiTable[88] := 88;
g7BitToAsciiTable[89] := 89;
g7BitToAsciiTable[90] := 90;
g7BitToAsciiTable[91] := 196;
g7BitToAsciiTable[92] := 204;
g7BitToAsciiTable[93] := 209;
g7BitToAsciiTable[94] := 220;
g7BitToAsciiTable[95] := 167;
g7BitToAsciiTable[96] := 191;
g7BitToAsciiTable[97] := 97;
g7BitToAsciiTable[98] := 98;
g7BitToAsciiTable[99] := 99;
g7BitToAsciiTable[100] := 100;
g7BitToAsciiTable[101] := 101;
g7BitToAsciiTable[102] := 102;
g7BitToAsciiTable[103] := 103;
g7BitToAsciiTable[104] := 104;
g7BitToAsciiTable[105] := 105;
g7BitToAsciiTable[106] := 106;
g7BitToAsciiTable[107] := 107;
g7BitToAsciiTable[108] := 108;
g7BitToAsciiTable[109] := 109;
g7BitToAsciiTable[110] := 110;
g7BitToAsciiTable[111] := 111;
g7BitToAsciiTable[112] := 112;
g7BitToAsciiTable[113] := 113;
g7BitToAsciiTable[114] := 114;
g7BitToAsciiTable[115] := 115;
g7BitToAsciiTable[116] := 116;
g7BitToAsciiTable[117] := 117;
g7BitToAsciiTable[118] := 118;
g7BitToAsciiTable[119] := 119;
g7BitToAsciiTable[120] := 120;
g7BitToAsciiTable[121] := 121;
g7BitToAsciiTable[122] := 122;
g7BitToAsciiTable[123] := 228;
g7BitToAsciiTable[124] := 246;
g7BitToAsciiTable[125] := 241;
g7BitToAsciiTable[126] := 252;
g7BitToAsciiTable[127] := 224;
// create ascii to 7-bit table
ZeroMemory(@gAsciiTo7BitTable, SizeOf(gAsciiTo7BitTable));
for i := 0 to High(g7BitToAsciiTable) do
begin
AsciiValue := g7BitToAsciiTable[i];
gAsciiTo7BitTable[AsciiValue] := i;
end;
end;
function ConvertAsciiTo7Bit(const AText: string; AUdhLen: Byte): AnsiString;
const
ESC = #27;
ESCAPED_ASCII_CODES = [#94, #123, #125, #92, #91, #126, #93, #124, #164];
var
Septet: Byte;
Ch: AnsiChar;
i: Integer;
begin
for i := 1 to Length(AText) do
begin
Ch := AnsiChar(AText[i]);
if not(Ch in ESCAPED_ASCII_CODES) then
Septet := gAsciiTo7BitTable[Byte(Ch)]
else
begin
Result := Result + ESC;
case (Ch) of
#12: Septet := 10;
#94: Septet := 20;
#123: Septet := 40;
#125: Septet := 41;
#92: Septet := 47;
#91: Septet := 60;
#126: Septet := 61;
#93: Septet := 62;
#124: Septet := 64;
#164: Septet := 101;
else Septet := 0;
end;
end;
Result := Result + AnsiChar(Septet);
end;
end;
function Convert7BitToAscii(const AText: AnsiString): string;
const
ESC = #27;
var
TextLen: Integer;
Ch: Char;
i: Integer;
begin
Result := '';
TextLen := Length(AText);
i := 1;
while (i <= TextLen) do
begin
Ch := Char(AText[i]);
if (Ch <> ESC) then
Result := Result + Char(g7BitToAsciiTable[Ord(Ch)])
else
begin
Inc(i); // skip ESC
if (i <= TextLen) then
begin
Ch := Char(AText[i]);
case (Ch) of
#10: Ch := #12;
#20: Ch := #94;
#40: Ch := #123;
#41: Ch := #125;
#47: Ch := #92;
#60: Ch := #91;
#61: Ch := #126;
#62: Ch := #93;
#64: Ch := #124;
#101: Ch := #164;
end;
Result := Result + Ch;
end;
end;
Inc(i);
end;
end;
function StrToHex(const AText: AnsiString): AnsiString; overload;
var
TextLen: Integer;
begin
// set the text buffer size
TextLen := Length(AText);
// set the length of the result to double the string length
SetLength(Result, TextLen * 2);
// convert the string to hex
BinToHex(PAnsiChar(AText), PAnsiChar(Result), TextLen);
end;
function StrToHex(const AText: string): string; overload;
begin
Result := string(StrToHex(AnsiString(AText)));
end;
function HexToStr(const AText: AnsiString): AnsiString; overload;
var
ResultLen: Integer;
begin
// set the length of the result to half the Text length
ResultLen := Length(AText) div 2;
SetLength(Result, ResultLen);
// convert the hex back into a string
if (HexToBin(PAnsiChar(AText), PAnsiChar(Result), ResultLen) <> ResultLen) then
Result := 'Error Converting Hex To String: ' + AText;
end;
function HexToStr(const AText: string): string; overload;
begin
Result := string(HexToStr(AnsiString(AText)));
end;
function Encode7Bit(const AText: string; AUdhLen: Byte;
out ATextLen: Byte): string;
// AText: Ascii text
// AUdhLen: Length of UDH including UDH Len byte (e.g. '050003CC0101' = 6 bytes)
// ATextLen: returns length of text that was encoded. This can be different
// than Length(AText) due to escape characters
// Returns text as encoded PDU hex string
var
Text7Bit: AnsiString;
Pdu: AnsiString;
PduIdx: Integer;
PduLen: Byte;
PaddingBits: Byte;
BitsToMove: Byte;
Septet: Byte;
Octet: Byte;
PrevOctet: Byte;
ShiftedOctet: Byte;
i: Integer;
begin
Result := '';
Text7Bit := ConvertAsciiTo7Bit(AText, AUdhLen);
ATextLen := Length(Text7Bit);
BitsToMove := 0;
// determine how many padding bits needed based on the UDH
if (AUdhLen > 0) then
PaddingBits := 7 - ((AUdhLen * 8) mod 7)
else
PaddingBits := 0;
// calculate the number of bytes needed to store the 7-bit text
// along with any padding bits that are required
PduLen := Ceil(((ATextLen * 7) + PaddingBits) / 8);
// reserve space for the PDU bytes
Pdu := AnsiString(StringOfChar(#0, PduLen));
PduIdx := 1;
for i := 1 to ATextLen do
begin
if (BitsToMove = 7) then
BitsToMove := 0
else
begin
// convert the current character to a septet (7-bits) and make room for
// the bits from the next one
Septet := (Byte(Text7Bit[i]) shr BitsToMove);
if (i = ATextLen) then
Octet := Septet
else
begin
// convert the next character to a septet and copy the bits from it
// to the octet (PDU byte)
Octet := Septet or
Byte((Byte(Text7Bit[i + 1]) shl Byte(7 - BitsToMove)));
end;
Byte(Pdu[PduIdx]) := Octet;
Inc(PduIdx);
Inc(BitsToMove);
end;
end;
// The following code pads the pdu on the *right* by shifting it to the *left*
// by <PaddingBits>. It does this by using the same bit storage convention as
// the 7-bit compression routine above, by taking the most significant
// <PaddingBits> from each PDU byte and moving them to the least significant
// bits of the next PDU byte. If there is no room in the last PDU byte for the
// high bits of the previous byte that were removed, then those bits are
// placed into an additional byte reserved for this purpose.
// Note: <PduLen> has already been set to account for the reserved byte if
// it is required.
if (PaddingBits > 0) then
begin
SetLength(Result, (PduLen * 2));
PrevOctet := 0;
for PduIdx := 1 to PduLen do
begin
Octet := Byte(Pdu[PduIdx]);
if (PduIdx = 1) then
ShiftedOctet := Byte(Octet shl PaddingBits)
else
ShiftedOctet := Byte(Octet shl PaddingBits) or
Byte(PrevOctet shr (8 - PaddingBits));
Byte(Pdu[PduIdx]) := ShiftedOctet;
PrevOctet := Octet;
end;
end;
Result := string(StrToHex(Pdu));
end;
function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
// APduData: Hex string representation of PDU data
// AUdhLen: Length of UDH including UDH Len (e.g. '050003CC0101' = 6 bytes)
// Returns decoded Ascii text
var
Pdu: AnsiString;
NumSeptets: Byte;
Septets: AnsiString;
PduIdx: Integer;
PduLen: Integer;
by: Byte;
currBy: Byte;
left: Byte;
mask: Byte;
nextBy: Byte;
Octet: Byte;
NextOctet: Byte;
PaddingBits: Byte;
ShiftedOctet: Byte;
i: Integer;
begin
Result := '';
PaddingBits := 0;
// convert hex string to bytes
Pdu := AnsiString(HexToStr(APduData));
PduLen := Length(Pdu);
// The following code removes padding at the end of the PDU by shifting it
// *right* by <PaddingBits>. It does this by taking the least significant
// <PaddingBits> from the following PDU byte and moving them to the most
// significant the current PDU byte.
if (AUdhLen > 0) then
begin
PaddingBits := 7 - ((AUdhLen * 8) mod 7);
for PduIdx := 1 to PduLen do
begin
Octet := Byte(Pdu[PduIdx]);
if (PduIdx = PduLen) then
ShiftedOctet := Byte(Octet shr PaddingBits)
else
begin
NextOctet := Byte(Pdu[PduIdx + 1]);
ShiftedOctet := Byte(Octet shr PaddingBits) or
Byte(NextOctet shl (8 - PaddingBits));
end;
Byte(Pdu[PduIdx]) := ShiftedOctet;
end;
end;
// decode
// number of septets in PDU after excluding the padding bits
NumSeptets := ((PduLen * 8) - PaddingBits) div 7;
Septets := AnsiString(StringOfChar(#0, NumSeptets));
left := 7;
mask := $7F;
nextBy := 0;
PduIdx := 1;
for i := 1 to NumSeptets do
begin
if mask = 0 then
begin
Septets[i] := AnsiChar(nextBy);
left := 7;
mask := $7F;
nextBy := 0;
end
else
begin
if (PduIdx > PduLen) then
Break;
by := Byte(Pdu[PduIdx]);
Inc(PduIdx);
currBy := ((by AND mask) SHL (7 - left)) OR nextBy;
nextBy := (by AND (NOT mask)) SHR left;
Septets[i] := AnsiChar(currBy);
mask := mask SHR 1;
left := left - 1;
end;
end; // for
// remove last character if unused
// this is kind of a hack, but frankly I don't know how else to compensate
// for it.
if (Septets[NumSeptets] = #0) then
SetLength(Septets, NumSeptets - 1);
// convert 7-bit alphabet to ascii
Result := Convert7BitToAscii(Septets);
end;
initialization
InitializeTables;
end.