I'm currently working on a class project for Structured Computer Organization using an x86 processor. The value that I am accessing is an 1 byte char, but I do not know how to compare it to an uppercase. They said to use an ASCII table of the hex format, but I'm not sure how to even compare the two.
void changeCase (char char_array[], int array_size ) {
__asm {
// BEGIN YOUR CODE HERE
mov eax, char_array; //eax is base image
mov edi, 0;
readArray:
cmp edi, array_size;
jge exit;
mov ebx, edi; //using ebx as offset
shl ebx, 2;
mov cl, [eax + ebx]; //using ecx to be the storage register
check:
//working on it
cmp cl, 0x41; //check if cl is <= than ASCII value 65 (A)
jl next_indx;
cmp cl, 0x7A; //check if cl is >= than ASCII value 122 (z)
jg next_indx;
cmp cl, 'a';
jl convert_down;
jge convert_up;
convert_down:
or cl, 0x20; //make it lowercase
jmp write;
convert_up:
and cl, 0x20; //make it uppercase
jmp write;
write:
mov byte ptr [eax + ebx], cl //slight funky town issue here,
next_indx:
inc edi;
exit:
cmp edi, array_size;
jl readArray;
mov char_array, eax;
// END YOUR CODE HERE
}
}
Anything helps at this point. Thank you in advance for the help!
edit 1:
Thanks for all the suggestion and points of clarity, edited my code to reflect change. Some problem with access violation now.
edit 2 (+):
Thanks for the helpful eyes people. I'm still getting to translating all letters now.
Variations of this question get asked all the time. This version of the problem (requiring conditional behaviour beyond just
if(isalpha(c)) c|=0x20;
)) made the problem complex enough that it wasn't immediately obvious how to do it efficiently.It turns out that
xor
wasn't hard to think of, and converting this code to unconditionally upcase or downcase only requires a simple change fromxor 0x20
toand ~0x20
oror 0x20
. (Simplifying a bit more is possible, too.)Here's how I'd do it with an attempt at optimally efficient asm. I even included a version with SIMD vectors, and another version of the byte loop using the branchless idea I got from vectorizing it.
Reading this answer is probably only useful once you understand the basic principles involved in solving this with not-so-optimized code. OTOH, there are very few operations actually needed, so there's not much code to grok. And I did comment it heavily. There are many helpful links in the x86 tag wiki, from tutorials to reference guides to performance tuning.
Converting between lower and upper case alphabetic ASCII characters only requires setting or clearing the
0x20
bit, because the ASCII character set is laid out with the ranges 32 from each other, and not crossing a mod32 boundary.For each byte:
'a'
and'z'
xor
and store the result back into the array.Doing the ASCII
isalpha(3)
test this way is safe: The only source bytes that end up in the'a'
..'z'
range from setting that bit are the upper-case alphabetic characters. It's just math that works for any two equal-sized ranges that don't cross a%32
boundary. (Or a%64
boundary if the relevant bit was0x40
, for example).To do the compare even more efficiently, I use the unsigned-compare trick so there's only one conditional branch inside the loop (other than the loop condition itself). See the comments in the code for an explanation.
This code might be more readable if some of the "design doc" stuff was in a block outside the code. It clutters things up a lot, and makes it look like there's a lot of code, but really there are very few instructions. (They're just hard to explain with short comments. Commenting code is tricky: comments that are too obvious are just clutter and take time away from reading the code and the useful comments.)
Vectorized
Actually for x86 I'd use SSE or AVX to do 16B at a time, doing the same algorithm, but doing the comparisons with two
pcmpgtb
. And of course unconditionally storing the results, so an array of all non-alphabetic characters would still be dirtied in the cache, using more memory bandwidth.There's no unsigned SSE compare, but we can still range-shift the range we're looking for down to the bottom. There are no values less than
-128
, so in a signed compare it works the way0
does in an unsigned compare.To do this, subtract
128
. (or add, or xor (carryless add); there's nowhere for the carry / borrow to go). This can be done in the same operation as subtracting'a'
.Then use the compare result as a mask to zero out bytes in a vector of
0x20
, so only the alphabetic characters get XORed with 0x20. (0 is the identity element for XOR/add/sub, which is often really handy for SIMD conditionals).See also a
strtoupper
version that has been tested, and code to call it in a loop, including handling of non-multiple-of-16 inputs, on implicit-length C strings (searching for the terminating 0 on the fly).This compiles to nice code, even without AVX, with only one extra
movdqa
to save a copy of a register. See the godbolt link for two earlier versions (one using two compares to keep it simple, another usingpblendvb
before I remembered to mask the vector of0x20
s instead of the result.)This same idea of using a branchless test would also work for the byte loop:
For 64bit code, just use
rsi
instead ofesi
. Everything else is the same.Apparently MSVC inline asm doesn't allow
.label
local-symbol names. I changed them for the first version (with conditional branch), but not this one.Using
movzx eax, byte [esi]
might be slightly better on some CPUs, to avoid a false dependency on the value of eax on function entry. OTOH, only AMD has that problem (and Silvermont), butmovzx
isn't quite as cheap as a load on AMD. (It is on Intel; one uop that only uses a load port, not an ALU port). Operating onal
after that is still good, since it avoids a partial-register stall (or extra instructions to avoid it) from readingeax
aftersetcc
writesal
. (There is nosetcc r/m32
, onlyr/m8
, unfortunately).I have to wonder what a professor would think if anyone handed in code like this for an assignment like that. :P I doubt even a smart compiler would use that
setcc
/shift
trick unless you led the compiler towards it. (Maybeunsigned mask = (tmp>='a' && tmp<='z'); mask <<= 5; a[i] ^= mask;
or something.) Compilers do know about the unsigned-compare trick, but gcc doesn't use it in some cases for non-compile-time-constant range checks, even when it can prove that the range is small enough.Courtesy of @KemyLand for the helpful breakdown of assembly code, I have figured out how to convert Uppercase to Lowercase and vice-versa.
}
Feel free to help explain what I might have missed! Thank you all for helping me understand the x86 assembly processor better.
In an ascii table all letters are continuous:
So you can see that by toggling the 6th bit you transform form upper to lower case.
in ASCII 'a'-'z' and 'A'-'Z' are equivalent except one bit, 0x20
your friend here is XOR.
if you have a char ( either 'A'-'Z' or 'a'-'z'), XORing it with 0x20 will toggle the case;
before XORing, doing a range check makes sense. (to see if the value is really a letter)
You can simplify this range check by ORing the value to check with 0xef, which will make 'a' to 'A' and 'z' to 'Z', and then do the range check only once
(if you only compare to <'a' and >'Z' you will miss the characters inbetween ('[', ']', etc...)
For clarity's sake, I'll just use pure assembly and assume that...
char_array
is a 32-bit pointer at[ebp+8]
.array_size
is a two's complement 32-bit number at[ebp+12]
.char
's encoding is ASCII.You should be able to deduce this yourself into inline assembly. Now, if you look at the table everyone is supposed to remember but barely anyone does, you'll notice some important details...
A
throughZ
map into codes0x41
through0x5A
, respectively.a
throughz
map into codes0x61
through0x7A
, respectively.As a result, the algorithm would be...
Now, let's translate this into assembly...
Once code reaches
.end_loop
, you're done.I hope this has led a light on you!