X86 NASM Assembly converting lower to upper and up

2019-06-07 04:00发布

问题:

As i am pretty new to assembly, i have a few questions in regards to how i should convert from a lowercase to an uppercase if the user enters an uppercase letter or vice versa in assembly. here is what i have so far:

section .data
Enter db "Enter: "
Enter_Len equ $-Enter

Output db "Output: "
Output_Len equ $-Output

Thanks db "Thanks!"
Thanks_Len equ $-Thanks

Loop_Iter dd 0 ; Loop counter

section .bss
In_Buffer resb 2
In_Buffer_Len equ $-In_Buffer

section .text
global _start

_start:
    ; Print Enter message
    mov eax, 4 ; sys_write
    mov ebx, 1
    mov ecx, Enter
    mov edx, Enter_Len
    int 80h

    ; Read input
    mov eax, 3 ; sys_read
    mov ebx, 0
    mov ecx, In_Buffer
    mov edx, In_Buffer_Len
    int 80h

So basically, if i am correct, my edx contains the string entered. Now comes the dilemma of converting from lower to upper and upper to lowercase. As i am absolutely new to this, have literally no clue what to do. Any help would be much appreciated :)

回答1:

If you only support ASCII, then you can force lowercase using an OR 0x20

  or   eax, 0x20

Similarly, you can transform a letter to uppercase by clearing that bit:

  and  eax, 0xBF   ; or use ~0x20

And as nneonneo mentioned, the character case can be swapped using the XOR instruction:

  xor  eax, 0x20

That only works if eax is between 'a' and 'z' or 'A' and 'Z', so you'd have to compare and make sure you are in the range:

  cmp  eax, 'a'
  jl   .not-lower
  cmp  eax, 'z'
  jg   .not-lower
  or   eax, 0x20
.not-lower:

I used nasm syntax. You may want to make sure the jl and jg are correct too...

If you need to transform any international character, then that's a lot more complicated unless you can call a libc tolower() or toupper() function that accept Unicode characters.


As a fair question: why would it work? (asked by kuhaku)

ASCII characters (also ISO-8859-1) have the basic uppercase characters defined between 0x41 and 0x5A and the lowercase characters between 0x61 and 0x7A.

To force 4 into 6 and 5 into 7, you force bit 5 (0x20) to be set.

To go to uppercase, you do the opposite, you remove bit 5 so it becomes zero.



回答2:

Okay, but your string is not in edx, it's in [ecx] (or [In_Buffer]) (and it's only one useful character). To get a single character...

mov al, [ecx]

In a HLL you do "if some condition, execute this code". You might wonder how the CPU knows whether to execute the code or not. What we really do (HLLs do this for you) is "if NOT condition, skip over this code" (to a label). Experiment with it, you'll figure it out.

Exit cleanly, whatever path your code takes. You don't show this, but I assume you do it.

I just posted some info on sys_read here.

It's for a completely different program (adding two numbers - "hex" numbers) but the part about sys_read might interest you...



回答3:

Cute trick: if they type only letters, you can XOR their input letters with 0x20 to swap their case.

Then, if they can type more than letters, you just have to check each letter to see if it is alphabetical before XORing it. You can do that with a test to see if it lies in the ranges 'a' to 'z' or 'A' to 'Z', for example.

Alternately, you can just map each letter through a 256-element table which maps the characters the way you want them (this is usually how functions like toupper are implemented, for example).



回答4:

Here is a NASM program I hacked together that flips the case of a string, you basically need to loop over the string and check each character for boundaries in ascii and then add or subtract 0x20 to change the case (that is the distance between upper and lower in ascii). You can use the Linux ascii command to see a table of ascii values.

File: flipcase.asm

section     .text
global      _start                 ; Entry point for linker (ld)

  ; Linker entry point                                
_start:                                                         
    mov     rcx,len                ; Place length of message into rcx
    mov     rbp,msg                ; Place address of our msg into rbp    
    dec     rbp                    ; Adjust count to offset

  ; Go through the buffer and convert lowercase to uppercase characters:
upperScan:
    cmp byte [rbp+rcx],0x41        ; Test input char against uppercase 'A'                 
    jb lowerScan                   ; Not uppercase Ascii < 0x41 ('A') - jump below
    cmp byte [rbp+rcx],0x5A        ; Test input char against uppercase 'Z' 
    ja lowerScan                   ; Not uppercase Ascii > 0x5A ('Z') - jump above  
     ; At this point, we have a uppercase character
    add byte [rbp+rcx],0x20        ; Add 0x20 to get the lowercase Ascii value
    jmp Next                       ; Done, jump to next

lowerScan:
    cmp byte [rbp+rcx],0x61        ; Test input char against lowercase                 
    jb Next                        ; Not lowercase Ascii < 0x61 ('a') - jump below
    cmp byte [rbp+rcx],0x7A        ; Test input char against lowercase 'z'
    ja Next                        ; Not lowercase Ascii > 0x7A ('z') - jump below  
     ; At this point, we have a lowercase char
    sub byte [rbp+rcx],0x20        ; Subtract 0x20 to get the uppercase Ascii value
     ; Fall through to next        

Next:   
    dec rcx                        ; Decrement counter
    jnz upperScan                  ; If characters remain, loop back

  ; Write the buffer full of processed text to stdout:
Write:        
    mov     rbx,1                  ; File descriptor 1 (stdout)    
    mov     rax,4                  ; System call number (sys_write)
    mov     rcx,msg                ; Message to write        
    mov     rdx,len                ; Length of message to write
    int     0x80                   ; Call kernel interrupt
    mov     rax,1                  ; System call number (sys_exit)
    int     0x80                   ; Call kernel

section     .data

msg     db  'hELLO, wwwoRLD!',0xa  ; Our dear string
len     equ $ - msg                ; Length of our dear string

Then you can compile and run it with:
$> nasm -felf64 flipcase.asm && ld -melf_x86_64 -o flipcase flipcase.o && ./flipcase



回答5:

Jeff Duntemann wrote a book called Assembly Language Step by Step programming with linux .. which covers this topic very well on page 275 - 277.

there he shows by using the code sub byte [ebp+ecx], 20h you can then change lower-case to upper-case , please note that the buffer is using 1024 bytes which is a faster and better way to do this then the previous example located on page 268-269 where the buffer only has 8 bits at a time.