ASM Interpreter: How are local variables stored?

2019-07-22 13:17发布

for my homework I need to write a very little virtual 16 bit Assembler-Interpreter in C#. It simulates the RAM with a byte-array (64k) and the registers with Variables (A,B,C,...). Now I need a way to save local variables, google says that they are allocated on the Stack.

But the thing thats unclear to me is, when they are allocated on the Stack (with push...), how is the Interpreter accessing them when they are used later?

See the following 2 lines:

pi INT 3
mov A, pi

In the first line, pi is allocated on the stack, in the second line, pi is used, but how should the Interpreter know where pi is in the stack to access its data? (my Stack is a byte-array too with 2 helper-functions (push, pop), there is also a pointer to the top of the stack)

4条回答
唯我独甜
2楼-- · 2019-07-22 13:37

Typically, the stack data is accessed relatively through the stack pointer, which is a CPU register that points to the last element stored on the stack. You may think of it as of an index into the memory of the emulated CPU. Every time you push something onto the stack, the stack pointer gets decremented by the size of that something and that something gets stored in the emulated memory at the address after the decrement. Whenever you pop something off the stack, the value is taken from the address stored in the stack pointer and then the stack pointer gets incremented by the size of that something. That's how CPU stacks work in many different CPUs.

If you're implementing a CPU emulator or CPU instruction emulator/interpreter, you don't care much of variables. What you care about is the CPU instructions that manipulate CPU registers and memory because your program is expressed in terms of CPU instructions. They (the instructions) have to keep track of all loacal variables stored on the stack, that is, their location relative to the current value of the stack pointer.

For example, if you consider a simple subroutine that adds two 16-bit integer values passed to it on the stack, it could look something like this in e.g. 16-bit x86 assembly:

myadd:
    push bp ; we'll be accessing stack through bp (can't do that through sp because there's no sp-relative memory addressing in 16-bit mode), so, let's save bp first
    mov bp, sp ; bp is equal to the stack pointer
    mov ax, dword ptr [bp + 4] ; load ax with 1st parameter stored at bp+4 (sp+4)
    add ax, dword ptr [bp + 6] ; add to ax 2nd parameter stored at bp+6 (sp+6)
    pop bp ; restore bp
    ret ; near return to the caller at address stored at sp (address after call myadd), the result/sum is in ax

And the caller may look like this:

    push word 2 ; prepare/store 2nd parameter on the stack
    push word 1 ; prepare/store 1st parameter on the stack
    call myadd ; near call, pushes address of next instruction (add), jumps to myadd
    add sp, 4 ; remove myadd's parameters (1 and 2) from the stack
    ; ax should now contain 3
查看更多
Explosion°爆炸
3楼-- · 2019-07-22 13:40

The answer is: it depends. You, as the language designer, should define, what are the visibility (if a variable name is defined, within which part of the source code is the name available?) and hiding (if there is another object with the same name defined in the visibility area of another object, which name wins?) rules of the variables. Different languages have different rules, just compare Javascript and C++.

So, I would do it this way. (1) Introduce a notion of namespace: the list of names visible at certain point of the source file. (Note that this is not the same as C++'s namespace notion.) The namespace should be able to resolve the name to some appropriate object. (2) Implement rules for changing namespaces when your interpreter changes from one procedure to another one, from one file to another one, from one block to another one, sees a declaration or end of block etc.

These steps are basically valid for most of languages, not just assembler.

(I think, Google's reference to "allocation on stack" refers to the idea of processing each subroutine in a a separate subroutine, and redefining a namespace there locally, therefore "on stack", so it will be automatically popped when the procedure finishes.)

查看更多
混吃等死
4楼-- · 2019-07-22 13:42

'google says that they are allocated on the Stack'

this is how it is implemented in real computers but that is not the whole story.

If you want to a virtual interpreter you need to use a Data Structure called 'Hash Table'.

Well this is a Homework question. So no direct answer :P But the following code will explain how to use the Hash Table. Store the variable names and values in Hash Tables.

using System;
using System.Collections;

class Program
{
    static Hashtable GetHashtable()
    {
    // Create and return new Hashtable.
    Hashtable hashtable = new Hashtable();
    hashtable.Add("Area", 1000);
    hashtable.Add("Perimeter", 55);
    hashtable.Add("Mortgage", 540);
    return hashtable;
    }

    static void Main()
    {
    Hashtable hashtable = GetHashtable();

    // See if the Hashtable contains this key.
    Console.WriteLine(hashtable.ContainsKey("Perimeter"));

    // Test the Contains method. It works the same way.
    Console.WriteLine(hashtable.Contains("Area"));

    // Get value of Area with indexer.
    int value = (int)hashtable["Area"];

    // Write the value of Area.
    Console.WriteLine(value);
    }
}
查看更多
smile是对你的礼貌
5楼-- · 2019-07-22 13:53

Typically there is no separate stack memory, instead the stack is in the regular RAM, so you only have the stack pointer that keeps track of it.

Also typically, local variables are allocated at the beginning of a subroutine by copying the stack pointer to another register, then moving the stack pointer to make room for the variables:

mov bp, sp ;copy stack pointer
sub sp, 4 ;make room for two integer variables

Accessing local variables is done using the copy of the stack pointer:

mov A, [bp-2] ;get first integer
mov B, [bp] ;get second integer

When you leave the subroutine, you restore the stack pointer to deallocate the local variables:

mov sp, bp ;restore stack
ret ;exit from subroutine

The syntax that you use in the question is usually used to declare global variables, not local variables:

.data
pi int 3 ;declare a label and allocate room for an int in the program
.code
mov A, pi ;use the address of the label to access the int
查看更多
登录 后发表回答