可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
As i am pretty new to assembly, i have a few questions in regards to how i should convert from a lowercase to an uppercase if the user enters an uppercase letter or vice versa in assembly. here is what i have so far:
section .data
Enter db "Enter: "
Enter_Len equ $-Enter
Output db "Output: "
Output_Len equ $-Output
Thanks db "Thanks!"
Thanks_Len equ $-Thanks
Loop_Iter dd 0 ; Loop counter
section .bss
In_Buffer resb 2
In_Buffer_Len equ $-In_Buffer
section .text
global _start
_start:
; Print Enter message
mov eax, 4 ; sys_write
mov ebx, 1
mov ecx, Enter
mov edx, Enter_Len
int 80h
; Read input
mov eax, 3 ; sys_read
mov ebx, 0
mov ecx, In_Buffer
mov edx, In_Buffer_Len
int 80h
So basically, if i am correct, my edx contains the string entered. Now comes the dilemma of converting from lower to upper and upper to lowercase. As i am absolutely new to this, have literally no clue what to do. Any help would be much appreciated :)
回答1:
If you only support ASCII, then you can force lowercase using an OR 0x20
or eax, 0x20
Similarly, you can transform a letter to uppercase by clearing that bit:
and eax, 0xBF ; or use ~0x20
And as nneonneo mentioned, the character case can be swapped using the XOR
instruction:
xor eax, 0x20
That only works if eax
is between 'a' and 'z' or 'A' and 'Z', so you'd have to compare and make sure you are in the range:
cmp eax, 'a'
jl .not-lower
cmp eax, 'z'
jg .not-lower
or eax, 0x20
.not-lower:
I used nasm syntax. You may want to make sure the jl
and jg
are correct too...
If you need to transform any international character, then that's a lot more complicated unless you can call a libc tolower() or toupper() function that accept Unicode characters.
As a fair question: why would it work? (asked by kuhaku)
ASCII characters (also ISO-8859-1) have the basic uppercase characters defined between 0x41 and 0x5A and the lowercase characters between 0x61 and 0x7A.
To force 4 into 6 and 5 into 7, you force bit 5 (0x20) to be set.
To go to uppercase, you do the opposite, you remove bit 5 so it becomes zero.
回答2:
Okay, but your string is not in edx
, it's in [ecx]
(or [In_Buffer]
) (and it's only one useful character). To get a single character...
mov al, [ecx]
In a HLL you do "if some condition, execute this code". You might wonder how the CPU knows whether to execute the code or not. What we really do (HLLs do this for you) is "if NOT condition, skip over this code" (to a label). Experiment with it, you'll figure it out.
Exit cleanly, whatever path your code takes. You don't show this, but I assume you do it.
I just posted some info on sys_read
here.
It's for a completely different program (adding two numbers - "hex" numbers) but the part about sys_read
might interest you...
回答3:
Cute trick: if they type only letters, you can XOR their input letters with 0x20 to swap their case.
Then, if they can type more than letters, you just have to check each letter to see if it is alphabetical before XORing it. You can do that with a test to see if it lies in the ranges 'a' to 'z' or 'A' to 'Z', for example.
Alternately, you can just map each letter through a 256-element table which maps the characters the way you want them (this is usually how functions like toupper
are implemented, for example).
回答4:
Here is a NASM program I hacked together that flips the case of a string, you basically need to loop over the string and check each character for boundaries in ascii and then add or subtract 0x20
to change the case (that is the distance between upper and lower in ascii). You can use the Linux ascii
command to see a table of ascii values.
File: flipcase.asm
section .text
global _start ; Entry point for linker (ld)
; Linker entry point
_start:
mov rcx,len ; Place length of message into rcx
mov rbp,msg ; Place address of our msg into rbp
dec rbp ; Adjust count to offset
; Go through the buffer and convert lowercase to uppercase characters:
upperScan:
cmp byte [rbp+rcx],0x41 ; Test input char against uppercase 'A'
jb lowerScan ; Not uppercase Ascii < 0x41 ('A') - jump below
cmp byte [rbp+rcx],0x5A ; Test input char against uppercase 'Z'
ja lowerScan ; Not uppercase Ascii > 0x5A ('Z') - jump above
; At this point, we have a uppercase character
add byte [rbp+rcx],0x20 ; Add 0x20 to get the lowercase Ascii value
jmp Next ; Done, jump to next
lowerScan:
cmp byte [rbp+rcx],0x61 ; Test input char against lowercase
jb Next ; Not lowercase Ascii < 0x61 ('a') - jump below
cmp byte [rbp+rcx],0x7A ; Test input char against lowercase 'z'
ja Next ; Not lowercase Ascii > 0x7A ('z') - jump below
; At this point, we have a lowercase char
sub byte [rbp+rcx],0x20 ; Subtract 0x20 to get the uppercase Ascii value
; Fall through to next
Next:
dec rcx ; Decrement counter
jnz upperScan ; If characters remain, loop back
; Write the buffer full of processed text to stdout:
Write:
mov rbx,1 ; File descriptor 1 (stdout)
mov rax,4 ; System call number (sys_write)
mov rcx,msg ; Message to write
mov rdx,len ; Length of message to write
int 0x80 ; Call kernel interrupt
mov rax,1 ; System call number (sys_exit)
int 0x80 ; Call kernel
section .data
msg db 'hELLO, wwwoRLD!',0xa ; Our dear string
len equ $ - msg ; Length of our dear string
Then you can compile and run it with:
$> nasm -felf64 flipcase.asm && ld -melf_x86_64 -o flipcase flipcase.o && ./flipcase
回答5:
Jeff Duntemann wrote a book called Assembly Language Step by Step programming with linux .. which covers this topic very well on page 275 - 277.
there he shows by using the code sub byte [ebp+ecx], 20h
you can then change lower-case to upper-case , please note that the buffer is using 1024 bytes which is a faster and better way to do this then the previous example located on page 268-269 where the buffer only has 8 bits at a time.