small language in python

2020-07-11 04:35发布

问题:

I'm writing what might not even be called a language in python. I currently have several operators: +, -, *, ^, fac, @, !!. fac computes a factorial, @ returns the value of a variable, !! sets a variable. The code is below. How would I go about writing a way to define functions in this simple language?

EDIT: i updated the code!

import sys, shlex, readline, os, string
List, assign, call, add, sub, div, Pow, mul, mod, fac, duf, read,\
kill, clr, STO, RET, fib, curs = {}, "set", "get", "+", "-", "/", "^", "*",\
"%", "fact", "func", "read", "kill", "clear", ">", "@", "fib", "vars"
def fact(num):
    if num == 1: return 1
    else: return num*fact(num-1)
def Simp(op, num2, num1):
    global List
    try: num1, num2 = float(num1), float(num2)
    except:
        try: num1 = float(num1)
        except:
            try: num2 = float(num2)
            except: pass
    if op == mul: return num1*num2
    elif op == div: return num1/num2
    elif op == sub: return num1-num2
    elif op == add: return num1+num2
    elif op == Pow: return num1**num2
    elif op == assign: List[num1] = num2; return "ok"
    elif op == call: return List[num1]
    elif op == fac: return fact(num1)
    elif op == duf: return "%s %s %s"%(duf, num1, num2)
    elif op == mod: return num1%num2
    elif op == kill: del List[num1]; return "ok"
    elif op == clr: os.system("clear")
    elif op == STO: List[num2] = num1; return "ok"
    elif op == RET: return List[num1]
    elif op == curs: return List
    elif op == read: List[num1] = Eval(raw_input("%s "%num1)); return "ok"
def Eval(expr):
    ops = "%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s"%(mul, add, sub, div, Pow, assign, call, fac, duf, mod, read, kill, clr, STO, RET, curs)
    stack, expr, ops = [], shlex.split(string.lower(expr)), ops.split()
    for i in expr:
        if i[0] != ';':
            if i not in ops: stack.append(i)
            elif i in ops: stack.append(Simp(i, stack.pop(), stack.pop()))
        else: stack.append("ok")
    return stack[0]
def shell():
    try:
        x = ""
        while x != "quit":
            x = raw_input("star>   ")
            try: l = Eval(x)
            except KeyError: l = "does not exist"
            except: l = "parse error!"
            if l != None: print "   =>",l,"\n"
    except (EOFError, KeyboardInterrupt): print
if len(sys.argv) > 1:
    x = open(sys.argv[1], 'r'); l = x.readlines(); x.close()
    for i in l:
        if i[0] != ";":
            i = ' '.join(i.split())
            x = Eval(i)
            if x != None: print i,"\n   =>",x,"\n"
        else: pass
    shell()
else: shell()

回答1:

Your program is very confused, and it needs to be fixed before it can be modified to support defining functions. I will do this in several steps and as I complete them, I will add them into the answer. This answer will get to be quite long.

Also, you obviously haven't decided what your language definition should be. You've decided to make your language definition sort of follow your implementation technique, and this is kind of broken, and results in a lot of pain.

First, the definition for your Simp function is really broken. It demands that everything take exactly two values off the stack, and put exactly one value back on. This is broken. The factorial function doesn't work this way, nor does the Fibonacci function, so you are forced to have a 'dummy' argument that's never used. Also, things like assigning to an element of your global list or dictionary have no reason to push values onto the stack, so you're left pushing 'ok'. This is broken and needs fixing.

Here is the version with this problem fixed. Notice that I've renamed Simp to builtin_op to more accurately reflect its purpose:

import sys, shlex, readline, os, string
List, assign, call, add, sub, div, Pow, mul, mod, fac, duf, read,\
kill, clr, STO, RET, fib, curs = {}, "set", "get", "+", "-", "/", "^", "*",\
"%", "fact", "func", "read", "kill", "clear", ">", "@", "fib", "vars"
def fact(num):
    if num == 1: return 1
    else: return num*fact(num-1)
def builtin_op(op, stack):
    global List
    if op == mul: stack.append(float(stack.pop())*float(stack.pop()))
    elif op == div: stack.append(float(stack.pop())/float(stack.pop()))
    elif op == sub: stack.append(float(stack.pop())-float(stack.pop()))
    elif op == add: stack.append(float(stack.pop())+float(stack.pop()))
    elif op == Pow: stack.append(float(stack.pop())**float(stack.pop()))
    elif op == assign: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == call: stack.append(List[stack.pop()])
    elif op == fac: stack.append(fact(stack.pop()))
    elif op == duf: stack.append("%s %s %s" % (duf, stack.pop(), stack.pop()))
    elif op == mod: stack.append(float(stack.pop())%float(stack.pop()))
    elif op == kill: del List[stack.pop()]
    elif op == clr: os.system("clear")
    elif op == STO: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == RET: stack.append(List[stack.pop()])
    elif op == curs: stack.append(List)
    elif op == read: prompt = stack.pop(); List[prompt] = Eval(raw_input("%s "%prompt)); stack.append(List[prompt])
def Eval(expr):
    ops = "%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s"%(mul, add, sub, div, Pow, assign, call, fac, duf, mod, read, kill, clr, STO, RET, curs)
    stack, expr, ops = [], shlex.split(string.lower(expr)), ops.split()
    for i in expr:
        if i[0] != ';':
            if i not in ops: stack.append(i)
            elif i in ops: builtin_op(i, stack)
        else: stack.append("ok")
    return stack[0]
def shell():
    try:
        x = ""
        while x != "quit":
            x = raw_input("star>   ")
            try: l = Eval(x)
            except KeyError: l = "does not exist"
            except: l = "parse error!"
            if l != None: print "   =>",l,"\n"
    except (EOFError, KeyboardInterrupt): print
if len(sys.argv) > 1:
    x = open(sys.argv[1], 'r'); l = x.readlines(); x.close()
    for i in l:
        if i[0] != ";":
            i = ' '.join(i.split())
            x = Eval(i)
            if x != None: print i,"\n   =>",x,"\n"
        else: pass
    shell()
else: shell()

There are still a number of problems here that aren't fixed, and I won't fix in any future version. For example, it's possible a value on the stack cannot be interpreted as a floating point number. This will cause an exception, and this exception may be thrown before the other value is read off the stack. This means that if the wrong 'types' are on the stack the stack could be in an ambiguous state after a 'parse error'. Generally you want to avoid situations like this in a language.

Defining functions is an interesting problem. In your language, evaluation is immediate. You don't have a mechanism for delaying evaluation until later. But, you're using the shlex module for parsing. And it has a way of saying that a whole group of characters (including spaces and such) are part of one entity. This gives us a quick and easy way to implement functions. You can do something like this:

star>   "3 +" add3 func

to create your function, and:

star>   2 add3 get

to call it. I used get because that's what you've assigned to call in your program.

The only problem is that the function is going to need access to the current state of the stack in order to work. You can easily feed the string for the function back into Eval, but Eval always creates a brand new stack each time it is called. In order to implement functions, this needs to be fixed. So I've added a defaulted stack argument to the Eval function. If this argument is left at its default value, Eval will still create a new stack, just like before. But if an existing stack is passed in, Eval will use it instead.

Here is the modified code:

import sys, shlex, readline, os, string
List, assign, call, add, sub, div, Pow, mul, mod, fac, duf, read,\
kill, clr, STO, RET, fib, curs = {}, "set", "get", "+", "-", "/", "^", "*",\
"%", "fact", "func", "read", "kill", "clear", ">", "@", "fib", "vars"
funcdict = {}
def fact(num):
    if num == 1: return 1
    else: return num*fact(num-1)
def builtin_op(op, stack):
    global List
    global funcdict
    if op == mul: stack.append(float(stack.pop())*float(stack.pop()))
    elif op == div: stack.append(float(stack.pop())/float(stack.pop()))
    elif op == sub: stack.append(float(stack.pop())-float(stack.pop()))
    elif op == add: stack.append(float(stack.pop())+float(stack.pop()))
    elif op == Pow: stack.append(float(stack.pop())**float(stack.pop()))
    elif op == assign: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == call: Eval(funcdict[stack.pop()], stack)
    elif op == fac: stack.append(fact(stack.pop()))
    elif op == duf: name = stack.pop(); funcdict[name] = stack.pop(); stack.append(name)
    elif op == mod: stack.append(float(stack.pop())%float(stack.pop()))
    elif op == kill: del List[stack.pop()]
    elif op == clr: os.system("clear")
    elif op == STO: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == RET: stack.append(List[stack.pop()])
    elif op == curs: stack.append(List)
    elif op == read: prompt = stack.pop(); List[prompt] = Eval(raw_input("%s "%prompt)); stack.append(List[prompt])
def Eval(expr, stack=None):
    ops = "%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s"%(mul, add, sub, div, Pow, assign, call, fac, duf, mod, read, kill, clr, STO, RET, curs)
    if stack is None:
        stack = []
    expr, ops = shlex.split(string.lower(expr)), ops.split()
    for i in expr:
        if i[0] != ';':
            if i not in ops: stack.append(i)
            elif i in ops: builtin_op(i, stack)
        else: stack.append("ok")
    return stack[0]
def shell():
    try:
        x = ""
        while x != "quit":
            x = raw_input("star>   ")
            try: l = Eval(x)
            except KeyError: l = "does not exist"
            except: l = "parse error!"
            if l != None: print "   =>",l,"\n"
    except (EOFError, KeyboardInterrupt): print
if len(sys.argv) > 1:
    x = open(sys.argv[1], 'r'); l = x.readlines(); x.close()
    for i in l:
        if i[0] != ";":
            i = ' '.join(i.split())
            x = Eval(i)
            if x != None: print i,"\n   =>",x,"\n"
        else: pass
    shell()
else: shell()

In stack based languages, two very useful built in operators are dup and swap. dup takes the top stack element and duplicates it. swap swaps the top two stack elements.

If you have dup you can implement a square function like so:

star>   "dup *" square func

Here is your program with dup and swap implemented:

import sys, shlex, readline, os, string
List, assign, call, add, sub, div, Pow, mul, mod, fac, duf, read,\
kill, clr, STO, RET, fib, curs, dup, swap = {}, "set", "get", "+", "-", "/", "^", "*",\
"%", "fact", "func", "read", "kill", "clear", ">", "@", "fib", "vars", "dup", "swap"
funcdict = {}
def fact(num):
    if num == 1: return 1
    else: return num*fact(num-1)
def builtin_op(op, stack):
    global List
    global funcdict
    if op == mul: stack.append(float(stack.pop())*float(stack.pop()))
    elif op == div: stack.append(float(stack.pop())/float(stack.pop()))
    elif op == sub: stack.append(float(stack.pop())-float(stack.pop()))
    elif op == add: stack.append(float(stack.pop())+float(stack.pop()))
    elif op == Pow: stack.append(float(stack.pop())**float(stack.pop()))
    elif op == assign: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == call: Eval(funcdict[stack.pop()], stack)
    elif op == fac: stack.append(fact(stack.pop()))
    elif op == duf: name = stack.pop(); funcdict[name] = stack.pop(); stack.append(name)
    elif op == mod: stack.append(float(stack.pop())%float(stack.pop()))
    elif op == kill: del List[stack.pop()]
    elif op == clr: os.system("clear")
    elif op == STO: val = List[stack.pop()] = stack.pop(); stack.append(val)
    elif op == RET: stack.append(List[stack.pop()])
    elif op == curs: stack.append(List)
    elif op == dup: val = stack.pop(); stack.append(val); stack.append(val)
    elif op == swap: val1 = stack.pop(); val2 = stack.pop(); stack.append(val1); stack.append(val2)
    elif op == read: prompt = stack.pop(); List[prompt] = Eval(raw_input("%s "%prompt)); stack.append(List[prompt])
def Eval(expr, stack=None):
    ops = "%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s"%(mul, add, sub, div, Pow, assign, call, fac, duf, mod, read, kill, clr, STO, RET, curs, dup, swap)
    if stack is None:
        stack = []
    expr, ops = shlex.split(string.lower(expr)), ops.split()
    for i in expr:
        if i[0] != ';':
            if i not in ops: stack.append(i)
            elif i in ops: builtin_op(i, stack)
        else: stack.append("ok")
    return stack[0]
def shell():
    try:
        x = ""
        while x != "quit":
            x = raw_input("star>   ")
            try: l = Eval(x)
            except KeyError: l = "does not exist"
            except: l = "parse error!"
            if l != None: print "   =>",l,"\n"
    except (EOFError, KeyboardInterrupt): print
if len(sys.argv) > 1:
    x = open(sys.argv[1], 'r'); l = x.readlines(); x.close()
    for i in l:
        if i[0] != ";":
            i = ' '.join(i.split())
            x = Eval(i)
            if x != None: print i,"\n   =>",x,"\n"
        else: pass
    shell()
else: shell()

Lastly, here is my version in Python that's much clearer (in my opinion anyway) than the Python you've written:

import shlex, functools, sys, StringIO

def bin_numeric_op(func):
    @functools.wraps(func)
    def execute(self):
        n2, n1 = self._stack.pop(), self._stack.pop()
        n1 = float(n1)
        n2 = float(n2)
        self._stack.append(func(n1, n2))
    return execute

def relational_op(func):
    @functools.wraps(func)
    def execute(self):
        n2, n1 = self._stack.pop(), self._stack.pop()
        self._stack.append(bool(func(n1, n2)))
    return execute

def bin_bool_op(func):
    @functools.wraps(func)
    def execute(self):
        n2, n1 = self._stack.pop(), self._stack.pop()
        n1 = bool(n1)
        n2 = bool(n2)
        self._stack.append(bool(func(n1, n2)))
    return execute

class Interpreter(object):
    def __init__(self):
        self._stack = []
        self._vars = {}
        self._squarestack = []

    def processToken(self, token):
        if token == '[':
            self._squarestack.append(len(self._stack))
        # Currently inside square brackets, don't execute
        elif len(self._squarestack) > 0:
            if token == ']':
                startlist = self._squarestack.pop()
                lst = self._stack[startlist:]
                self._stack[startlist:] = [tuple(lst)]
            else:
                self._stack.append(token)
        # Not current inside list and close square token, something's wrong.
        elif token == ']':
            raise ValueError("Unmatched ']'")
        elif token in self.builtin_ops:
            self.builtin_ops[token](self)
        else:
            self._stack.append(token)
    def get_stack(self):
        return self._stack
    def get_vars(self):
        return self._vars
    @bin_numeric_op
    def add(n1, n2):
        return n1 + n2
    @bin_numeric_op
    def mul(n1, n2):
        return n1 * n2
    @bin_numeric_op
    def div(n1, n2):
        return n1 / n2
    @bin_numeric_op
    def sub(n1, n2):
        return n1 - n2
    @bin_numeric_op
    def mod(n1, n2):
        return n1 % n2
    @bin_numeric_op
    def Pow(n1, n2):
        return n1**n2
    @relational_op
    def less(v1, v2):
        return v1 < v2
    @relational_op
    def lesseq(v1, v2):
        return v1 <= v2
    @relational_op
    def greater(v1, v2):
        return v1 > v2
    @relational_op
    def greatereq(v1, v2):
        return v1 > v2
    @relational_op
    def isequal(v1, v2):
        return v1 == v2
    @relational_op
    def isnotequal(v1, v2):
        return v1 != v2
    @bin_bool_op
    def bool_and(v1, v2):
        return v1 and v2
    @bin_bool_op
    def bool_or(v1, v2):
        return v1 or v2
    def bool_not(self):
        stack = self._stack
        v1 = stack.pop()
        stack.append(not v1)
    def if_func(self):
        stack = self._stack
        pred = stack.pop()
        code = stack.pop()
        if pred:
            self.run(code)
    def ifelse_func(self):
        stack = self._stack
        pred = stack.pop()
        nocode = stack.pop()
        yescode = stack.pop()
        code = yescode if pred else nocode
        self.run(code)
    def store(self):
        stack = self._stack
        value = stack.pop()
        varname = stack.pop()
        self._vars[varname] = value
    def fetch(self):
        stack = self._stack
        varname = stack.pop()
        stack.append(self._vars[varname])
    def remove(self):
        varname = self._stack.pop()
        del self._vars[varname]
    # The default argument is because this is used internally as well.
    def run(self, code=None):
        if code is None:
            code = self._stack.pop()
        for tok in code:
            self.processToken(tok)
    def dup(self):
        self._stack.append(self._stack[-1])
    def swap(self):
        self._stack[-2:] = self._stack[-1:-3:-1]
    def pop(self):
        self._stack.pop()
    def showstack(self):
        print"%r" % (self._stack,)
    def showvars(self):
        print "%r" % (self._vars,)
    builtin_ops = {
        '+': add,
        '*': mul,
        '/': div,
        '-': sub,
        '%': mod,
        '^': Pow,
        '<': less,
        '<=': lesseq,
        '>': greater,
        '>=': greatereq,
        '==': isequal,
        '!=': isnotequal,
        '&&': bool_and,
        '||': bool_or,
        'not': bool_not,
        'if': if_func,
        'ifelse': ifelse_func,
        '!': store,
        '@': fetch,
        'del': remove,
        'call': run,
        'dup': dup,
        'swap': swap,
        'pop': pop,
        'stack': showstack,
        'vars': showvars
        }

def shell(interp):
    try:
        while True:
            x = raw_input("star>   ")
            msg = None
            try:
                interp.run(shlex.split(x))
            except KeyError:
                msg = "does not exist"
            except:
                sys.excepthook(*sys.exc_info())
                msg = "parse error!"
            if msg != None:
                print "   =>",msg,"\n"
            else:
                print "   => %r\n" % (interp.get_stack(),)
    except (EOFError, KeyboardInterrupt):
        print

interp = Interpreter()
if len(sys.argv) > 1:
    lex = shlex.shlex(open(sys.argv[1], 'r'), sys.argv[1])
    tok = shlex.get_token()
    while tok is not None:
        interp.processToken(tok)
        tok = lex.get_token()
shell(interp)

This new version supports an if and ifelse statement. With this and function calls, it's possible to implement the fib and fact functions in the language. I will add in how you would define these later.

Here is how you would define the fib function:

star>   fib [ dup [ pop 1 0 + ] swap [ dup 1 - fib @ call swap 2 - fib @ call + ] swap 0 + 2 0 + < ifelse ] !
   => []

star>   15 fib @ call
   => [987.0]

The 0 + 2 0 + sequence before the < is to force the comparison to be a numeric comparison.

Also notice how the [ and ] single characters are quoting operators. They cause everything between them to be not executed and instead stored on the stack as a single list of items. This is the key to defining functions. Functions are a sequence of tokens that you can execute with the call operator. They can also be used for 'anonymous blocks' that are sort of a cross between lambda expressions and a standard Python block. These are used in the fib function for the two possible paths of the ifelse statement.

The parser for this is ridiculously simple. And shlex is plenty powerful enough as a lexer for this simple language. Other projects would be fetching individual items out of a list. Creating a new list that consists of only a portion of a previous list. 'Listifying' a single token on the stack. Implementation of a while primitive. Numeric operators that operate on integers (in actual Forth, the numeric operations by default operate on integers and you need to specify something like +. to get a floating point version). And some operations on symbol tokens that allow string manipulation. Perhaps a 'split' and 'join' operation that turns a token into a list of individual tokens for the characters or joins a list together into a single token would be sufficient.



回答2:

The right answer depends on what you are worried about. If you are worried about having a scale-able solution, where the language complexity will grow, you probably should start learning/using one of the parser modules. That is potentially an answer if you are worried about performance, as some of the modules are likely to be better optimized than what you could easily generate by hand.

If, on the other hand, what you are interested in is learning, check out the shunting yard algorithm. You could probably create a dictionary of functions (which will be faster than you if statement) with the operations along the lines of:

funcs = {}
funcs["+"] = lambda x, y: x + y
funcs["*"] = lambda x, y: y * y

Then in your Simp function you can call

func = funcs.get[Op]
if func is not None:
    func[Op](num1,num2)
else:
    #whatever you want to do here


回答3:

What you need is to convert a sequence of symbols (numbers, operation on numbers, parenthesis) into a tree structure, that represents the calculation expressed by your sequence of symbols. Such a thing is exactly the job of a ''parser'' You might want to have a look at simple parsers like this http://en.wikipedia.org/wiki/LL_parser Those are simple to code, and you can compute the parsing tables with a pencil and paper.



回答4:

Looks like you're trying to write something like this Forth in python.



回答5:

You could have a dictionary where to store variables and associate them with a function name.

For instance, let's say you're reading line by line your code:

a = 1
b = 2
c = a + b
function x() 
   d = 4
   e = 5
   f = d + e 
end

When you spot variables ( a,b,c ) you store them in a a list and that list within a scope, this could be the global scope something along the lines:

variables = scopes["global"]
variables.append( "a" ) 

You could have a similar data structure for functions, so when you spot a function you add it to that structure:

fs = functions["global"]
fs.append("x")

And you also add a new "scope" in the scope dictionary

scopes["x"] = [] // the function x doesn't have any var 

When you find a new var and if you're inside a function definition you store that new var in that "scope"

variables = scopes["x"]
variables.append("d")

Does it make sense?

All this should be doable in your current implementation. If you want to do something more serious I strongly recommend you to buy this book

http://pragprog.com/titles/tpdsl/language-implementation-patterns

Even though it is written using Java as example, it will give you solid foundations language apps and it is very easy to read.

Then you should have the tools to :

  1. Create a Lexer ( split input into tokens )
  2. A Parse ( validate if the tokens follow your language rules )
  3. An AST ( to walk your input )
  4. Identify program symbols ( like vars and functions )
  5. Build an interpreter.

I hope this helps