When I encode/decode SMS PDU (GSM 7 Bit) user data

2019-03-11 08:58发布

问题:

While I can successfully encode and decode the user data part of an SMS message when a UDH is not present, I'm having trouble doing so when a UDH is present (in this case, for concatenated SMS).

When I decode or encode the user data, do I need to prepend the UDH to the text before doing so?

This article provides an encoding routine sample that compensates for the UDH with padding bits (which I still don't completely understand) but it doesn't give an example of data being passed to the routine so I don't have a clear use case (and I could not find a decoding sample on the site): http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/.

So far, I have been able to get some results if I prepend the UDH to the user data before decoding it, but I suspect this is just a coincidence.

As an example (using values from https://en.wikipedia.org/wiki/Concatenated_SMS):

UDH := '050003000302';
ENCODED_USER_DATA_PART := 'D06536FB0DBABFE56C32'; // with padding, evidently
DecodedUserData := Decode7Bit(UDH + ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);

Output: "ß@ø¿Æ @hello world"

EncodedUserData := Encode7Bit(DecodedUserData);
DecodedUserData := Decode7Bit(EncodedEncodedUserData);
Writeln(DecodedUserData);

Same Output: "ß@ø¿Æ @hello world"

Without prepending the UDH I get garbage:

DecodedUserData := Decode7Bit(ENCODED_USER_DATA_PART);
Writeln(DecodedUserData);

Output: "PKYY§An§eYI"

What is correct way of handling this?

Am I supposed to include the UDH with the text when encoding the user data?

Am I supposed to strip off the garbage characters after decoding, or am I (as I suspect) completely off base with this assumption?

While the decoding algorithm here seems to work without a UDH it doesn't seem to take any UDH information into account: Looking for GSM 7bit encode/decode algorithm.

I would be eternally grateful if someone could set me straight on the correct way to proceed. Any clear examples/code samples would be very much appreciated. ;-)

I will also provide a small sample application that includes the algorithms if anyone feels it will help solve the riddle.

EDIT 1:

I'm using Delphi XE2 Update 4 Hotfix 1

EDIT 2:

Thanks to help from @whosrdaddy, I was able to successfully get my encoding/decoding routines to work.

As a side note, I was curious as to why the user data needed to be on a 7-bit boundary when the UDH wasn't encoded with it, but the last sentence in the paragraph from the ETSI specification quoted by @whosrdaddy answered that:

If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header. This is to ensure that the SM itself starts on an octet boundary so that an earlier phase mobile will be capable of displaying the SM itself although the TP-UD Header in the TP-UD field may not be understood

My code is based in part on examples from the following resources:

Looking for GSM 7bit encode/decode algorithm

https://en.wikipedia.org/wiki/Concatenated_SMS

http://mobiletidings.com/2009/02/18/combining-sms-messages/

http://mobiletidings.com/2009/07/06/how-to-pack-gsm7-into-septets/

http://mobileforensics.files.wordpress.com/2007/06/understanding_sms.pdf

http://www.dreamfabric.com/sms/

http://www.mediaburst.co.uk/blog/concatenated-sms/

Here's the code for anyone else who's had trouble with SMS encoding/decoding. I'm sure it can be simplified/optimized (and comments are welcome), but I've tested it with several different permutations and UDH header lengths with success. I hope it helps.

unit SmsUtils;

interface

uses Windows, Classes, Math;

function Encode7Bit(const AText: string; AUdhLen: Byte;
  out ATextLen: Byte): string;

function Decode7Bit(const APduData: string; AUdhLen: Integer): string;

implementation

var
  g7BitToAsciiTable: array [0 .. 127] of Byte;
  gAsciiTo7BitTable: array [0 .. 255] of Byte;

procedure InitializeTables;
var
  AsciiValue: Integer;
  i: Integer;
begin
  // create 7-bit to ascii table
  g7BitToAsciiTable[0] := 64; // @
  g7BitToAsciiTable[1] := 163;
  g7BitToAsciiTable[2] := 36;
  g7BitToAsciiTable[3] := 165;
  g7BitToAsciiTable[4] := 232;
  g7BitToAsciiTable[5] := 223;
  g7BitToAsciiTable[6] := 249;
  g7BitToAsciiTable[7] := 236;
  g7BitToAsciiTable[8] := 242;
  g7BitToAsciiTable[9] := 199;
  g7BitToAsciiTable[10] := 10;
  g7BitToAsciiTable[11] := 216;
  g7BitToAsciiTable[12] := 248;
  g7BitToAsciiTable[13] := 13;
  g7BitToAsciiTable[14] := 197;
  g7BitToAsciiTable[15] := 229;
  g7BitToAsciiTable[16] := 0;
  g7BitToAsciiTable[17] := 95;
  g7BitToAsciiTable[18] := 0;
  g7BitToAsciiTable[19] := 0;
  g7BitToAsciiTable[20] := 0;
  g7BitToAsciiTable[21] := 0;
  g7BitToAsciiTable[22] := 0;
  g7BitToAsciiTable[23] := 0;
  g7BitToAsciiTable[24] := 0;
  g7BitToAsciiTable[25] := 0;
  g7BitToAsciiTable[26] := 0;
  g7BitToAsciiTable[27] := 0;
  g7BitToAsciiTable[28] := 198;
  g7BitToAsciiTable[29] := 230;
  g7BitToAsciiTable[30] := 223;
  g7BitToAsciiTable[31] := 201;
  g7BitToAsciiTable[32] := 32;
  g7BitToAsciiTable[33] := 33;
  g7BitToAsciiTable[34] := 34;
  g7BitToAsciiTable[35] := 35;
  g7BitToAsciiTable[36] := 164;
  g7BitToAsciiTable[37] := 37;
  g7BitToAsciiTable[38] := 38;
  g7BitToAsciiTable[39] := 39;
  g7BitToAsciiTable[40] := 40;
  g7BitToAsciiTable[41] := 41;
  g7BitToAsciiTable[42] := 42;
  g7BitToAsciiTable[43] := 43;
  g7BitToAsciiTable[44] := 44;
  g7BitToAsciiTable[45] := 45;
  g7BitToAsciiTable[46] := 46;
  g7BitToAsciiTable[47] := 47;
  g7BitToAsciiTable[48] := 48;
  g7BitToAsciiTable[49] := 49;
  g7BitToAsciiTable[50] := 50;
  g7BitToAsciiTable[51] := 51;
  g7BitToAsciiTable[52] := 52;
  g7BitToAsciiTable[53] := 53;
  g7BitToAsciiTable[54] := 54;
  g7BitToAsciiTable[55] := 55;
  g7BitToAsciiTable[56] := 56;
  g7BitToAsciiTable[57] := 57;
  g7BitToAsciiTable[58] := 58;
  g7BitToAsciiTable[59] := 59;
  g7BitToAsciiTable[60] := 60;
  g7BitToAsciiTable[61] := 61;
  g7BitToAsciiTable[62] := 62;
  g7BitToAsciiTable[63] := 63;
  g7BitToAsciiTable[64] := 161;
  g7BitToAsciiTable[65] := 65;
  g7BitToAsciiTable[66] := 66;
  g7BitToAsciiTable[67] := 67;
  g7BitToAsciiTable[68] := 68;
  g7BitToAsciiTable[69] := 69;
  g7BitToAsciiTable[70] := 70;
  g7BitToAsciiTable[71] := 71;
  g7BitToAsciiTable[72] := 72;
  g7BitToAsciiTable[73] := 73;
  g7BitToAsciiTable[74] := 74;
  g7BitToAsciiTable[75] := 75;
  g7BitToAsciiTable[76] := 76;
  g7BitToAsciiTable[77] := 77;
  g7BitToAsciiTable[78] := 78;
  g7BitToAsciiTable[79] := 79;
  g7BitToAsciiTable[80] := 80;
  g7BitToAsciiTable[81] := 81;
  g7BitToAsciiTable[82] := 82;
  g7BitToAsciiTable[83] := 83;
  g7BitToAsciiTable[84] := 84;
  g7BitToAsciiTable[85] := 85;
  g7BitToAsciiTable[86] := 86;
  g7BitToAsciiTable[87] := 87;
  g7BitToAsciiTable[88] := 88;
  g7BitToAsciiTable[89] := 89;
  g7BitToAsciiTable[90] := 90;
  g7BitToAsciiTable[91] := 196;
  g7BitToAsciiTable[92] := 204;
  g7BitToAsciiTable[93] := 209;
  g7BitToAsciiTable[94] := 220;
  g7BitToAsciiTable[95] := 167;
  g7BitToAsciiTable[96] := 191;
  g7BitToAsciiTable[97] := 97;
  g7BitToAsciiTable[98] := 98;
  g7BitToAsciiTable[99] := 99;
  g7BitToAsciiTable[100] := 100;
  g7BitToAsciiTable[101] := 101;
  g7BitToAsciiTable[102] := 102;
  g7BitToAsciiTable[103] := 103;
  g7BitToAsciiTable[104] := 104;
  g7BitToAsciiTable[105] := 105;
  g7BitToAsciiTable[106] := 106;
  g7BitToAsciiTable[107] := 107;
  g7BitToAsciiTable[108] := 108;
  g7BitToAsciiTable[109] := 109;
  g7BitToAsciiTable[110] := 110;
  g7BitToAsciiTable[111] := 111;
  g7BitToAsciiTable[112] := 112;
  g7BitToAsciiTable[113] := 113;
  g7BitToAsciiTable[114] := 114;
  g7BitToAsciiTable[115] := 115;
  g7BitToAsciiTable[116] := 116;
  g7BitToAsciiTable[117] := 117;
  g7BitToAsciiTable[118] := 118;
  g7BitToAsciiTable[119] := 119;
  g7BitToAsciiTable[120] := 120;
  g7BitToAsciiTable[121] := 121;
  g7BitToAsciiTable[122] := 122;
  g7BitToAsciiTable[123] := 228;
  g7BitToAsciiTable[124] := 246;
  g7BitToAsciiTable[125] := 241;
  g7BitToAsciiTable[126] := 252;
  g7BitToAsciiTable[127] := 224;

  // create ascii to 7-bit table
  ZeroMemory(@gAsciiTo7BitTable, SizeOf(gAsciiTo7BitTable));
  for i := 0 to High(g7BitToAsciiTable) do
  begin
    AsciiValue := g7BitToAsciiTable[i];
    gAsciiTo7BitTable[AsciiValue] := i;
  end;
end;

function ConvertAsciiTo7Bit(const AText: string; AUdhLen: Byte): AnsiString;
const
  ESC = #27;
  ESCAPED_ASCII_CODES = [#94, #123, #125, #92, #91, #126, #93, #124, #164];
var
  Septet: Byte;
  Ch: AnsiChar;
  i: Integer;
begin
  for i := 1 to Length(AText) do
  begin
    Ch := AnsiChar(AText[i]);
    if not(Ch in ESCAPED_ASCII_CODES) then
      Septet := gAsciiTo7BitTable[Byte(Ch)]
    else
    begin
      Result := Result + ESC;
      case (Ch) of
        #12: Septet := 10;
        #94: Septet := 20;
        #123: Septet := 40;
        #125: Septet := 41;
        #92: Septet := 47;
        #91: Septet := 60;
        #126: Septet := 61;
        #93: Septet := 62;
        #124: Septet := 64;
        #164: Septet := 101;
      else Septet := 0;
      end;
    end;
    Result := Result + AnsiChar(Septet);
  end;
end;

function Convert7BitToAscii(const AText: AnsiString): string;
const
  ESC = #27;
var
  TextLen: Integer;
  Ch: Char;
  i: Integer;
begin
  Result := '';
  TextLen := Length(AText);
  i := 1;
  while (i <= TextLen) do
  begin
    Ch := Char(AText[i]);
    if (Ch <> ESC) then
      Result := Result + Char(g7BitToAsciiTable[Ord(Ch)])
    else
    begin
      Inc(i); // skip ESC
      if (i <= TextLen) then
      begin
        Ch := Char(AText[i]);
        case (Ch) of
          #10: Ch := #12;
          #20: Ch := #94;
          #40: Ch := #123;
          #41: Ch := #125;
          #47: Ch := #92;
          #60: Ch := #91;
          #61: Ch := #126;
          #62: Ch := #93;
          #64: Ch := #124;
          #101: Ch := #164;
        end;
        Result := Result + Ch;
      end;
    end;
    Inc(i);
  end;
end;

function StrToHex(const AText: AnsiString): AnsiString; overload;
var
  TextLen: Integer;
begin
  // set the text buffer size
  TextLen := Length(AText);
  // set the length of the result to double the string length
  SetLength(Result, TextLen * 2);
  // convert the string to hex
  BinToHex(PAnsiChar(AText), PAnsiChar(Result), TextLen);
end;

function StrToHex(const AText: string): string; overload;
begin
  Result := string(StrToHex(AnsiString(AText)));
end;

function HexToStr(const AText: AnsiString): AnsiString; overload;
var
  ResultLen: Integer;
begin
  // set the length of the result to half the Text length
  ResultLen := Length(AText) div 2;
  SetLength(Result, ResultLen);
  // convert the hex back into a string
  if (HexToBin(PAnsiChar(AText), PAnsiChar(Result), ResultLen) <> ResultLen) then
    Result := 'Error Converting Hex To String: ' + AText;
end;

function HexToStr(const AText: string): string; overload;
begin
  Result := string(HexToStr(AnsiString(AText)));
end;

function Encode7Bit(const AText: string; AUdhLen: Byte;
  out ATextLen: Byte): string;
// AText: Ascii text
// AUdhLen: Length of UDH including UDH Len byte (e.g. '050003CC0101' = 6 bytes)
// ATextLen: returns length of text that was encoded.  This can be different
// than Length(AText) due to escape characters
// Returns text as encoded PDU hex string
var
  Text7Bit: AnsiString;
  Pdu: AnsiString;
  PduIdx: Integer;
  PduLen: Byte;
  PaddingBits: Byte;
  BitsToMove: Byte;
  Septet: Byte;
  Octet: Byte;
  PrevOctet: Byte;
  ShiftedOctet: Byte;
  i: Integer;
begin
  Result := '';
  Text7Bit := ConvertAsciiTo7Bit(AText, AUdhLen);
  ATextLen := Length(Text7Bit);
  BitsToMove := 0;
  // determine how many padding bits needed based on the UDH
  if (AUdhLen > 0) then
    PaddingBits := 7 - ((AUdhLen * 8) mod 7)
  else
    PaddingBits := 0;
  // calculate the number of bytes needed to store the 7-bit text
  // along with any padding bits that are required
  PduLen := Ceil(((ATextLen * 7) + PaddingBits) / 8);
  // reserve space for the PDU bytes
  Pdu := AnsiString(StringOfChar(#0, PduLen));
  PduIdx := 1;
  for i := 1 to ATextLen do
  begin
    if (BitsToMove = 7) then
      BitsToMove := 0
    else
    begin
      // convert the current character to a septet (7-bits) and make room for
      // the bits from the next one
      Septet := (Byte(Text7Bit[i]) shr BitsToMove);
      if (i = ATextLen) then
        Octet := Septet
      else
      begin
        // convert the next character to a septet and copy the bits from it
        // to the octet (PDU byte)
        Octet := Septet or
          Byte((Byte(Text7Bit[i + 1]) shl Byte(7 - BitsToMove)));
      end;
      Byte(Pdu[PduIdx]) := Octet;
      Inc(PduIdx);
      Inc(BitsToMove);
    end;
  end;
  // The following code pads the pdu on the *right* by shifting it to the *left*
  // by <PaddingBits>. It does this by using the same bit storage convention as
  // the 7-bit compression routine above, by taking the most significant
  // <PaddingBits> from each PDU byte and moving them to the least significant
  // bits of the next PDU byte. If there is no room in the last PDU byte for the
  // high bits of the previous byte that were removed, then those bits are
  // placed into an additional byte reserved for this purpose.
  // Note: <PduLen> has already been set to account for the reserved byte if
  // it is required.
  if (PaddingBits > 0) then
  begin
    SetLength(Result, (PduLen * 2));
    PrevOctet := 0;
    for PduIdx := 1 to PduLen do
    begin
      Octet := Byte(Pdu[PduIdx]);
      if (PduIdx = 1) then
        ShiftedOctet := Byte(Octet shl PaddingBits)
      else
        ShiftedOctet := Byte(Octet shl PaddingBits) or
          Byte(PrevOctet shr (8 - PaddingBits));
      Byte(Pdu[PduIdx]) := ShiftedOctet;
      PrevOctet := Octet;
    end;
  end;
  Result := string(StrToHex(Pdu));
end;

function Decode7Bit(const APduData: string; AUdhLen: Integer): string;
// APduData: Hex string representation of PDU data
// AUdhLen: Length of UDH including UDH Len (e.g. '050003CC0101' = 6 bytes)
// Returns decoded Ascii text
var
  Pdu: AnsiString;
  NumSeptets: Byte;
  Septets: AnsiString;
  PduIdx: Integer;
  PduLen: Integer;
  by: Byte;
  currBy: Byte;
  left: Byte;
  mask: Byte;
  nextBy: Byte;
  Octet: Byte;
  NextOctet: Byte;
  PaddingBits: Byte;
  ShiftedOctet: Byte;
  i: Integer;
begin
  Result := '';
  PaddingBits := 0;
  // convert hex string to bytes
  Pdu := AnsiString(HexToStr(APduData));
  PduLen := Length(Pdu);
  // The following code removes padding at the end of the PDU by shifting it
  // *right* by <PaddingBits>. It does this by taking the least significant
  // <PaddingBits> from the following PDU byte and moving them to the most
  // significant the current PDU byte.
  if (AUdhLen > 0) then
  begin
    PaddingBits := 7 - ((AUdhLen * 8) mod 7);
    for PduIdx := 1 to PduLen do
    begin
      Octet := Byte(Pdu[PduIdx]);
      if (PduIdx = PduLen) then
        ShiftedOctet := Byte(Octet shr PaddingBits)
      else
      begin
        NextOctet := Byte(Pdu[PduIdx + 1]);
        ShiftedOctet := Byte(Octet shr PaddingBits) or
          Byte(NextOctet shl (8 - PaddingBits));
      end;
      Byte(Pdu[PduIdx]) := ShiftedOctet;
    end;
  end;
  // decode
  // number of septets in PDU after excluding the padding bits
  NumSeptets := ((PduLen * 8) - PaddingBits) div 7;
  Septets := AnsiString(StringOfChar(#0, NumSeptets));
  left := 7;
  mask := $7F;
  nextBy := 0;
  PduIdx := 1;
  for i := 1 to NumSeptets do
  begin
    if mask = 0 then
    begin
      Septets[i] := AnsiChar(nextBy);
      left := 7;
      mask := $7F;
      nextBy := 0;
    end
    else
    begin
      if (PduIdx > PduLen) then
        Break;
      by := Byte(Pdu[PduIdx]);
      Inc(PduIdx);
      currBy := ((by AND mask) SHL (7 - left)) OR nextBy;
      nextBy := (by AND (NOT mask)) SHR left;
      Septets[i] := AnsiChar(currBy);
      mask := mask SHR 1;
      left := left - 1;
    end;
  end; // for
  // remove last character if unused
  // this is kind of a hack, but frankly I don't know how else to compensate
  // for it.
  if (Septets[NumSeptets] = #0) then
    SetLength(Septets, NumSeptets - 1);
  // convert 7-bit alphabet to ascii
  Result := Convert7BitToAscii(Septets);
end;

initialization
  InitializeTables;
end.

回答1:

no you don't include the UDH part when encoding, but you if read the GSM phase 2 specification on page 57, they mention this fact : "If 7 bit data is used and the TP-UD-Header does not finish on a septet boundary then fill bits are inserted after the last Information Element Data octet so that there is an integral number of septets for the entire TP-UD header". When you include a UDH part this could not be the case, so all you need to do is calculate the offset (= number of fill bits)

Calculating the offset, this code assumes that UDHPart is a AnsiString:

Len := Length(UDHPart) shr 1;
Offset := 7 - ((Len * 8) mod 7);  // fill bits

now when encoding the 7bit data, you proceed as normal but at the end, you shift the data Offset bits to the left, this code has the encoded data in variable result (ansistring):

 // fill bits
 if Offset > 0 then
  begin
   v := Result;
   Len := Length(v);
   BytesRemain := ceil(((Len * 7)+Offset) / 8);       
   Result := StringOfChar(#0, BytesRemain);
   for InPos := 1 to BytesRemain do
    begin
     if InPos = 1 then
      Byte(Result[InPos]) := Byte(v[InPos]) shl offset
     else
      Byte(Result[InPos]) := (Byte(v[InPos]) shl offset) or (Byte(v[InPos-1]) shr (8 - offset));
    end;
  end;

Decoding is same thing really, you first shift the 7 bit data offset bits to the right before decoding...

I hope this will set you onto the right track...



回答2:

In your case Data is D06536FB0DBABFE56C32

Get first char is D0 => h (in first 7 bit, the 8th bit not use)

The rest is 6536FB0DBABFE56C32

In bin

(01100101)0011011011111011000011011011101010111111111001010110110000110010

Shift right to left. => each right 7 bit is a char!

001100100110110011100101101111111011101000001101111 1101100 110110(0 1100101)

I shift 7 to left. you can get string from above. but i do for easy show :D

(1100101)(1101100)(1101100)(1101111)(0100000)(1110111)(1101111)(1110010)(1101100)(1100100)00

And the string is "ello world"

combine with first char you get "hello world"