Why do multiple strings overlap/overwrite in my ou

2019-07-28 21:21发布

问题:

I'm having a problem displaying 4 different strings in only one line in assembly 8086. The output should be "You are", "first name", "middle name", and "last name". It works fine with the first two, but the last two overlaps with the first one, meaning, "You are" ends up being rewritten by "middle name", and further gets rewritten by "last name". If I use next line before both of the last two, it prints out fine, but I want to display all 4 strings in one line, not display it in 3 lines. I tried searching the net, but most answers are limited to displaying 2 strings only.

;=====output======

mov ah, 09
mov dx, offset crlf ;next line
int 21h

mov ah, 09
mov dx, offset msg4             ;displays "You are"
int 21h

mov ah, 09
mov dx, offset string1 + 2      ;displays inputted "first name"
int 21h 

mov ah, 09
mov dx, offset string3 + 2       ; this should appear next to string1,  
int 21h                           not rewrite msg4...

mov ah, 09
mov dx, offset string2 + 2       ; this should appear next to string3, not 
int 21h                             rewrite msg4 and string3

This is what the output ends up like:

Enter 1st name: Helena
Enter last name: Ramos
Enter middle name: Ang

Ramosre Helena                    ;"Ang" rewrites "You are", and then 
                                    "Ramos" rewrites it again

                                 ; This is what I want to see: 
                                 ;     You are Helena Ang Ramos

I'm pretty much a newbie in assembly, and my professor isn't exactly the most helpful teacher, given that we don't have any books, class handouts only define the instructions, and lab exercises are almost copy-paste the code that he wrote, so most of my classmates are self-taught in actual programming. This is only a small part of the homework, where the actual homework requires us to display the middle initial instead of the middle name, but I can't even get it to display all 4 strings properly! At this point, I have a feeling there's a problem with how the strings are pushed in the stack, but my limited knowledge prevents me from figuring out why.

Full code if you're interested:

org  100h   
.model small
.stack 200

.data
msg1 db "Enter 1st name: $"
string1 db 50,?,50 dup ('$')
msg2 db 0ah, 0dh, "Enter last name: $"
string2 db 50,?,50 dup ('$')
msg3 db 0ah, 0dh, "Enter middle name: $"
string3 db 50,?,50 dup ('$')

msg4 db 0ah, 0dh, "You are $"

crlf db 0ah, 0dh, '$'

.code

mov ax, @data
mov ds, ax

mov ah, 09
mov dx, offset msg1
int 21h

mov ah, 0ah
mov dx, offset string1 ;input first name
int 21h

mov ah, 09
mov dx, offset msg2
int 21h

mov ah, 0ah
mov dx, offset string2 ;input last name
int 21h

mov ah, 09
mov dx, offset msg3
int 21h

mov ah, 0ah
mov dx, offset string3 ;input middle name
int 21h

;=====output======

mov ah, 09
mov dx, offset crlf ;next line
int 21h

mov ah, 09
mov dx, offset msg4             ;displays "You are"
int 21h

mov ah, 09
mov dx, offset string1 + 2      ;displays inputted "first name"
int 21h 

mov ah, 09
mov dx, offset string3 + 2       ; this should appear next to string1,  
int 21h                           not rewrite msg4...

mov ah, 09
mov dx, offset string2 + 2       ; this should appear next to string3, not 
int 21h                             rewrite msg4 and string3

回答1:

You should check in debugger, what happens.

If you would, you would see that after entering "Helena" in first prompt, the memory content at string1 address is:

32 07 48 65 6C 65 6E 61 0D 24 24 24 24 ...

The important thing about these data is the value 0D also called CR (carriage return) at address string1 + 2 + 7 - 1 (+2 to get to the string data, +7 is length of input, -1 is moving back to last character).

The same thing does apply for the other two inputs.

Once you will start outputting the final line, you will write on screen:

You are Helena and there comes the first CR, which will return the BIOS cursor back to the start of line, but it is not followed by LF (0Ah), so the cursor doesn't move also one line down, instead the output of second input will just overwrite the beginning of the line.

To fix: ahead of displaying each name inputted by user, do:

    mov ah, 09
    mov dx, offset string1 + 2      ;displays inputted "first name"

    movzx bx, byte ptr [dx-1]       ; bx = length of input
    mov byte ptr [bx + dx - 1], '$' ; overwrite last input char (most likely CR) with '$'
    int 21h 

(if the real mode does not support addressing mode [bx + dx -1], just do add bx,dx and use [bx-1] then ... My memory is hazy, I was doing lot more protected mode x86 assembly, where the addressing modes are much more relaxed and universal).

And if the 8086 doesn't have even movzx, then xor bx,bx mov bl,[dx-1] can do the same thing (reading byte from memory, zero-extending it to word).

EDIT: actually you may want to overwrite that last char rather with SPACE ' ', to put space between the names... and at the string3 you may overwrite it with mov word ptr [bx+dx-1],0A0Dh to write at its end CR+LF pair for new line, but then you should put one more byte after string3 buffer to avoid memory overwrite with longest possible input (full 50 chars).


EDIT: some more comments...

don't have any books

It's age of Internet. Teaching you 8086 (probably in emu8086 I guess) is sort of cruel joke by itself, then again this knowledge will give you new perspective on anything programming related in the future, so even teaching 8086 is worth it. You should be able to google out some emu8086 tutorial (although I'm not sure about quality, judging by some questions on SO it's very difficult to find tutorial covering only basics, yet properly ... then there's the "The Art of Assembly Language Programming" book, which is free to read in it's electronic form for personal use, and it is very thorough and detailed... but also huge (you may still want to check the 16b DOS edition, if your course is really that bad, and give few chapters quick look to identify areas where a proper study on your side is required).

I have a feeling there's a problem with how the strings are pushed in the stack

You didn't touch the stack anywhere in your code... in this light your comment sounds quite scary, you should probably really dig into that book, even if it will mean to read through 200-300 pages. After all, you didn't use "urgent", so you probably have few months of time to catch up with the assembly and computer architecture basics.


FIX of "SPACE EDIT":

But when you overwrite that last character with space, then if the user enters 50 chars long name, the name will be no more '$' terminated, so you should rather define your buffers to have fixed terminator beyond them, like this:

string1 db 50,0,51 dup ('$')   ; after 50 byte buffer there's one more '$'

This is one of the hardest parts in ASM programming, to avoid any buffer/stack overflow bugs, stemming from wrong data definitions, and unsecure usage of memory. Always test your code with minimum/maximum inputs in debugger, and watch memory content to see if it behaves as expected, or if some unexpected memory overwrite happened and where. You may also want to define some guard values between buffers, like:

string1 db 50,0,51 dup ('$')
    db 0FFh
msg2 db 0ah, 0dh, "Enter last name: $"

Then if you see in debugger some corner-case input did made the FF byte gone, you know you have your source wrong (for example if the first byte would be 52 and the user would enter 52 char long name).


Fix with valid 8086 code for real mode (the original suggested addressing mode [dx] which is not legal in real mode):

Add first a procedure at the end of the code, which will overwrite last inputted character + one more byte of the pascal-like string (length is stored in memory in byte ahead of string itself):

; input:
; dx = address of string ([dx-1] must contain length of string)
; ax = two chars to be written at the end of string (al = first, ah = second)
changeEndOfInputString:
    push   si           ; preserve original si and bx values
    push   bx
    mov    si,dx        ; use SI for addressing in real mode
    xor    bx,bx        ; bx = 0
    mov    bl,[si-1]    ; bx = (zero extended) string length
    mov    [si+bx-1],ax ; overwrite last inputted char + one more
    pop    bx           ; restore bx and si and return
    pop    si
    ret

And now the displaying of name strings will use that to add space and new line where needed.

    ;=====output======

    ;display "You are "
    mov ah, 9
    mov dx, offset msg4
    int 21h

    ;display inputted "first name" with space added
    mov dx, offset string1 + 2
    mov ax, 2420h   ; 20h = ' ', 24h = '$' (ASCII encoding)
    call changeEndOfInputString
    mov ah, 9
    int 21h 

    ;display inputted "middle name" with space added
    mov dx, offset string3 + 2
    mov ax, 2420h   ; 20h = ' ', 24h = '$' (ASCII encoding)
    call changeEndOfInputString
    mov ah, 9
    int 21h

    ;display inputted "last name" with CR+LF added
    mov dx, offset string2 + 2
    mov ax, 0A0Dh   ; 0Dh = CR, 0Ah = LF (DOS "new line")
    call changeEndOfInputString
    mov ah, 9
    int 21h

And make sure your string buffers have additional bytes at end to accommodate for these modifications:

string1 db 50,0,51 dup ('$')   ; first name buffer
string2 db 50,0,52 dup ('$')   ; last name buffer
string3 db 50,0,51 dup ('$')   ; middle name buffer

The "last name" buffer needs 52 '$', because if user enters full 50 chars name, the 50th and 51th char will be overwritten to CR+LF, so the 52th '$' will save you as string terminator for int 21h,9 service.

The first+middle name buffers are ok with 51 '$', as the input will be modified with ' '+'$', so in case of 50 chars input the 51th char will stay set to '$' even after modification.

Also set the second byte of 0Ah service buffers to 0, not ?, because it's actually input value in some DOS versions, telling DOS how much of buffer content is valid for input editing, and you don't provide any valid string before call inside the buffer.

And final note:

crlf db 0ah, 0dh, '$'

This is LF+CR (wrong order), the DOS "new line" should be CR+LF, i.e. 13, 10. The 10, 13 mostly works as expected, but in assembly world this is bug any way, you are just lucky that it doesn't have bigger impact. (you have it wrong in all string definitions).

so I've tried replacing it with bx, and there's no error but it still overwrites.

You absolutely MUST to learn to use the debugger, so you have some chance to check where the code doesn't do what you expect, and what is the content of memory and registers in such situation.

You have no chance to program in assembly without debugger, it took me three reads of your original question and thorough slow (about 10 minutes) simulation in my head (too lazy to search for 8086 emulator and assembler compatible with your syntax) until I figured out why you have strings overwritten, as the "only CR" is not that easy to spot from source. And I did years of x86 assembly programming, writing megabytes of ASM source code. And it was still tricky to spot this mistake. If I had the debugger and run it through it, I would spot the problem immediately after checking content of string1 buffer after user input.

Or to explain the need of debugger with different example, when I was learning ASM programming, I didn't have the computer, so I had to write code on paper, then I had few minutes at school to try it out usually once per week. Usually it crashed due to some bug, but I didn't have enough time to debug it on machine, so I had to find the bug at home at the paper (and there was no Internet back around 1985 to ask on SO). It took me often 3-5 weeks to have working version with all bugs fixed. If I had the computer at will, with debugger, I would do all the fixing probably in 1h. Then again, now I see many bugs just by reading the source (even in other programming languages), that's how my brain went after that paper experience, paying attention to every dot, coma and number...