This question was out there for a while and I thought I should offer some bonus points if I can get it to work.
What did I do…
Recently at work, I wrote a parser that would convert a binary file in a readable format. Binary file isn't an Ascii file with 10101010
characters. It has been encoded in binary. So if I do a cat
on the file, I get the following -
[jaypal~/Temp/GTP]$ cat T20111017153052.NEW
==?sGTP?ղ?N????W????&Xx1?T?&Xx1?;
?d@#e?
?0H????????|?X?@@(?ղ??VtPOC01
cceE??k@9??W傇??R?K?i2??d@#e???&Xx1&Xx??!?
blackberrynet?/??!
??!
??#ripassword??W傅?W傆??0H??
#R??@Vtc@@(?ղ??n?POC01
So I used hexdump
utility to make the file display following content and redirected it to a file. Now I had my output file which was a text file containing Hex values.
[jaypal~/Temp/GTP]$ hexdump -C T20111017153052.NEW
00000000 3d 3d 01 f8 73 47 54 50 02 f1 d5 b2 be 4e e4 d7 |==..sGTP.....N..|
00000010 00 01 01 00 01 80 00 cc 57 e5 82 00 00 00 00 00 |........W.......|
00000020 00 00 00 00 00 00 00 00 87 d3 f5 13 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 10 |................|
00000040 01 01 0f 00 00 00 00 00 26 58 78 31 00 b3 54 c5 |........&Xx1..T.|
00000050 26 58 78 31 00 b4 3b 0a 00 00 ad 64 13 40 01 03 |&Xx1..;....d.@..|
00000060 23 16 65 f3 01 01 0b 91 30 19 48 99 f2 ff ff ff |#.e.....0.H.....|
00000070 ff ff ff 02 00 7c 00 dc 01 58 00 a0 40 40 28 02 |.....|...X..@@(.|
00000080 f1 d5 b2 b8 ca 56 74 50 4f 43 30 31 00 00 00 00 |.....VtPOC01....|
00000090 00 04 0a 63 63 07 00 00 00 00 00 00 00 00 00 00 |...cc...........|
000000a0 00 00 00 65 45 00 00 b4 fb 6b 40 00 39 11 16 cd |...eE....k@.9...|
000000b0 cc 57 e5 82 87 d3 f5 52 85 a1 08 4b 00 a0 69 02 |.W.....R...K..i.|
000000c0 32 10 00 90 00 00 00 00 ad 64 00 00 02 13 40 01 |2........d....@.|
After tons of awk
, sed
and cut
, the script converted hex values into readable text. To do so, I used the offset positioning which would mark start and end position of each parameter converted. The resulting file after all conversion looks like this
[jaypal:~/Temp/GTP] cat textfile.txt
Beginning of DB Package Identifier: ==
Total Package Length: 508
Offset to Data Record Count field: 115
Data Source: GTP
Timestamp: 2011-10-25
Matching Site Processor ID: 1
DB Package format version: 1
DB Package Resolution Type: 0
DB Package Resolution Value: 1
DB Package Resolution Cause Value: 128
Transport Protocol: 0
SGSN IP Address: 220.206.129.47
GGSN IP Address: 202.4.210.51
Why did I do it
I am a test engineer and to manually validate binary files was a major pain. I had to manually parse through the offsets and use a calculator to convert them and validate it against Wireshark and GUI.
Now the question part
I wish to do the reverse of what I did. This was my plan -
- Have an easy to read Input text file which would have
Parameters : Values
. - User can simply put values next to them (eg Date would be a parameter and user can give date they want the data file to have).
- The script will cut out all relevent information (user provided information) from the Input text file and convert them into hex values.
- Once the file has been converted in to hex values, I wish to encode it back into binary.
First three steps are done
Problem
Once my script converts the Input text file in to a text file with hex values, I get a file like follows (notice I can do cat
on it).
[visdba@hw-diam-test01 ParserDump]$ cat temp_file | sed 's/.\{32\}/&\n/g' | sed 's/../& /g'
3d 3d 01 fc 73 47 54 50 02 f1 d6 55 3c 9f 49 9c
00 01 01 00 01 80 00 dc ce 81 2f 00 00 00 00 00
00 00 00 00 00 00 00 00 ca 04 d2 33 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10
01 01 0f 00 00 07 04 ea 00 00 ff ff 00 00 14 b7
00 00 ff ff 00 00 83 ec 00 00 83 62 54 14 59 00
60 38 34 f5 01 01 0b 58 62 70 11 60 f6 ff ff ff
ff ff ff 02 00 7c 00 d0 01 4c 00 b0 40 40 28 02
f1 d6 55 38 cb 2b 23 50 4f 43 30 31 00 00 00 00
00 04 0a 63 63 07 00 00 00 00 00 00 00 00 00 00
My intension is to encoded this converted file in to a binary so that when I do cat
on the file, I get bunch of garbage values.
[jaypal~/Temp/GTP]$ cat temp.file
==?sGTP?ղ?N????W????&Xx1?T?&Xx1?;
?d@#e?
?0H????????|?X?@@(?ղ??VtPOC01
cceE??k@9??W傇??R?K?i2??d@#e???&Xx1&Xx??!?
blackberrynet?/??!
??!
So the question is this. How do I encode it in this form?
Why I want to do this?
We don't have a lot of GTP (GPRS Tunnelling Protocol) messages on production. I thought if I reverse engineer this, I could effectively create a data generator and make my own data.
Sum things up
There may be sophisticated tools out there, but I don't want to spend too much time learning them. It's been around 2 months, I have started working on the *nix platform and just getting hand around it's power tools like sed
and awk
.
What I do want is some help and guidance to make this happen.
Thanks again for reading! 200 points awaits for someone who can guide me in the right direction. :)
Sample Files
Here is a sample of Original Binary File
Here is a sample of Input Text File that would allow the User to punch in values
Here is a sample of File that my script creates after all the conversion from the Input Text File is complete.
How do I change the encoding of File 3
to File 1
?
You can use xxd to convert to and from binary files / hexdumps quite simply.
data to hex
hex to data
or
The
-p
is postscript mode which allows for a more freeform inputThis is the output from
xxd -r -p text
where text is the data you give aboveTo change encoding from File3 to File1, you use a script like this:
Or, if you just want to pipe it, and use like the xxd example in this thread:
If you really want to use BASH for this, then I suggest you start using array to nicely build your packet. Here is starting code:
Output:
Sure, this is not solution the the original post... The solution will use something like this to generate binary output. The biggest problem is that we still do not know the types of fields in the packet. We also do not know the architecture (is it bigendian, or littleendian, is it 32bit, or 64bit). You must give us the specification. For an instance, the lenght of the package is of what type? We do not know that from that TXT file!
In order to help you do what you have to do, you must find us the specification about sizes of those fields.
Note it is a good start though. You need to implement convenience functions to, for an example, automatically fill the buffer[] with values from a string encoded with hex values. So you can do something like
write $offset "ff c0 d3 ba be"
.Using
cut
andawk
, you can do it fairly simply using agawk
(GNU Awk) extension function,strtonum()
:Or, if you are using a non-GNU version of 'new
awk
', then you can use:If you want to use other tools (Perl and Python sprint to mind; Ruby would be another possibility), you can do it easily enough.
odx
is a program similar to thehexdump
program. The script above was modified to read 'hexdump.out' as the input file, and the output piped intoodx
instead of a file, and gives the following output:Or, using
hexdump -C
in place ofodx
:awk is the wrong tool for the job here, but there are a thousand ways to do it. The easiest way is often a small C program, or any other language that explicitely makes a distinction between a character and a string of decimal digits.
However, to do it in awk, use the "%c" printf format.
There's a tool binmake allowing to describe in text format some binary data and generate a binary file (or output to stdout). It allows to change the endianess and number formats and accepts comments.
First get and compile binmake (the binary program will be in
bin/
):Create your text file
file.txt
:Generate your binary file
file.bin
:You can also pipe it using
stdin
andstdout
: