NASM Linux x64 | Encode binary to base64

2019-08-24 06:20发布

问题:

I'm trying to encode a binary file into base64. Althrough, I'm stuck at the few steps and I'm also not sure if this is the way to think, see commentaries in code below :

SECTION .bss            ; Section containing uninitialized data

    BUFFLEN equ 6       ; We read the file 6 bytes at a time
    Buff:   resb BUFFLEN    ; Text buffer itself

SECTION .data           ; Section containing initialised data

    B64Str: db "000000"
    B64LEN equ $-B64Str

    Base64: db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"

SECTION .text           ; Section containing code

global  _start          ; Linker needs this to find the entry point!

_start: 
    nop         ; This no-op keeps gdb happy...

; Read a buffer full of text from stdin:
Read:
    mov eax,3       ; Specify sys_read call
    mov ebx,0       ; Specify File Descriptor 0: Standard Input
    mov ecx,Buff        ; Pass offset of the buffer to read to
    mov edx,BUFFLEN     ; Pass number of bytes to read at one pass
    int 80h         ; Call sys_read to fill the buffer
    mov ebp,eax     ; Save # of bytes read from file for later
    cmp eax,0       ; If eax=0, sys_read reached EOF on stdin
    je Done         ; Jump If Equal (to 0, from compare)

; Set up the registers for the process buffer step:
    mov esi,Buff        ; Place address of file buffer into esi
    mov edi,B64Str      ; Place address of line string into edi
    xor ecx,ecx     ; Clear line string pointer to 0


;;;;;;
  GET 6 bits from input
;;;;;;


;;;;;;
  Convert to B64 char
;;;;;;

;;;;;;
  Print the char
;;;;;;

;;;;;;
  process to the next 6 bits
;;;;;;


; All done! Let's end this party:
Done:
    mov eax,1       ; Code for Exit Syscall
    mov ebx,0       ; Return a code of zero 
    int 80H         ; Make kernel call

So, in text, it should do that :

1) Hex value :

7C AA 78

2) Binary value :

0111 1100 1010 1010 0111 1000

3) Groups in 6 bits :

011111 001010 101001 111000

4) Convert to numbers :

31 10 41 56

5) Each number is a letter, number or symbol :

31 = f
10 = K
41 = p
56 = 4

So, final output is : fKp4

So, my questions are : How to get the 6 bits and how to convert those bits in char ?

回答1:

You have two major ways how to implement it, either by generic loop capable to pick any 6 bits, or by having fixed code dealing with 24 bits (3 bytes) of input (will produce exactly 4 base64 characters and end at byte-boundary, so you can read next 24bits from +3 offset).

Let's say you have esi pointing into source binary data, which are padded enough with zeroes to make abundant memory access beyond input buffer safe (+3 bytes at worst case).

And edi pointing to some output buffer (having at least ((input_length+2)/3*4) bytes, maybe with some padding as B64 requires for ending sequence).

; convert 3 bytes of input into four B64 characters of output
mov   eax,[esi]  ; read 3 bytes of input
      ; (reads actually 4B, 1 will be ignored)
add   esi,3      ; advance pointer to next input chunk
bswap eax        ; first input byte as MSB of eax
shr   eax,8      ; throw away the 1 junk byte (LSB after bswap)
; produce 4 base64 characters backward (last group of 6b is converted first)
; (to make the logic of 6b group extraction simple: "shr eax,6 + and 0x3F)
mov   edx,eax    ; get copy of last 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (4th)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bl,[Base64+edx]  ; convert 0-63 value into B64 character (3rd)
shl   ebx,16     ; make room in ebx for next character (4+3 in upper 32b)
mov   edx,eax    ; get copy of next 6 bits
shr   eax,6      ; throw away 6bits being processed already
and   edx,0x3F   ; keep only last 6 bits
mov   bh,[Base64+edx]  ; convert 0-63 value into B64 character (2nd)
; here eax contains exactly only 6 bits (zero extended to 32b)
mov   bl,[Base64+eax]  ; convert 0-63 value into B64 character (1st)
mov   [edi],ebx  ; store four B64 characters as output
add   edi,4      ; advance output pointer

After the last group of 3B input you must overwrite last output with proper amount of '=' to fix the fake zeroes outputted. I.e. input 1B (needs 8 bits, 2x B64 chars) => output ends with '==', 2B input (needs 16b, 3x B64 char) => ends '=', 3B input => full 24bits used => valid 4x B64 char.

If you don't want to read whole file into memory and produce whole output buffer in memory, you can make the in/out buffer of limited length, like only 900B input -> 1200B output, and process input in 900B blocks. Or you can use 3B -> 4B in/out buffer, then remove the pointer advancing completely (or even esi/edi usage, and use fixed memory), as you will have to load/store in/out for every iteration separately then.

Disclaimer: this code is written to be straightforward, not performant, as you asked how to extract 6 bits and how to convert value into character, so I guess staying with the basic x86 asm instructions is best.

I'm not even sure how to make it perform better without profiling the code for bottlenecks and experimenting with other variants. Surely the partial register usage (bh, bl vs ebx) will be costly, so there's very likely better solution (or maybe even some SIMD optimized version for larger input block).

And I didn't debug that code, just written in here in answer, so proceed with caution and check in debugger how/if it works.