Is there a way to autogenerate valid arithmetic ex

2020-07-10 06:28发布

问题:

I'm currently trying to create a Python script that will autogenerate space-delimited arithmetic expressions which are valid. However, I get sample output that looks like this: ( 32 - 42 / 95 + 24 ( ) ( 53 ) + ) 21

While the empty parentheses are perfectly OK by me, I can't use this autogenerated expression in calculations since there's no operator between the 24 and the 53, and the + before the 21 at the end has no second argument.

What I want to know is, is there a way to account for/fix these errors using a Pythonic solution? (And before anyone points it out, I'll be the first to acknowledge that the code I posted below is probably the worst code I've pushed and conforms to...well, very few of Python's core tenets.)

import random
parentheses = ['(',')']
ops = ['+','-','*','/'] + parentheses

lines = 0

while lines < 1000:
    fname = open('test.txt','a')
    expr = []
    numExpr = lines
    if (numExpr % 2 == 0):
        numExpr += 1
    isDiv = False # Boolean var, makes sure there's no Div by 0

    # isNumber, isParentheses, isOp determine whether next element is a number, parentheses, or operator, respectively
    isNumber = random.randint(0,1) == 0 # determines whether to start sequence with number or parentheses
    isParentheses = not isNumber
    isOp = False
    # Counts parentheses to ensure parentheses are matching
    numParentheses = 0
    while (numExpr > 0 or numParentheses > 0):
        if (numExpr < 0 and numParentheses > 0):
            isDiv = False
            expr.append(')')
            numParentheses -= 1
        elif (isOp and numParentheses > 0):
            rand = random.randint(0,5)
            expr.append(ops[rand])
            isDiv = (rand == 3) # True if div op was just appended
            # Checks to see if ')' was appended
            if (rand == 5):
                isNumber = False
                isOp = True
                numParentheses -= 1
            # Checks to see if '(' was appended
            elif (rand == 4):
                isNumber = True
                isOp = False
                numParentheses += 1
            # All other operations go here
            else:
                isNumber = True
                isOp = False
        # Didn't add parentheses possibility here in case expression in parentheses somehow reaches 0
        elif (isNumber and isDiv):
            expr.append(str(random.randint(1,100)))
            isDiv = False
            isNumber = False
            isOp = True
        # If a number's up, decides whether to append parentheses or a number
        elif (isNumber):
            rand = random.randint(0,1)
            if (rand == 0):
                expr.append(str(random.randint(0,100)))
                isNumber = False
                isOp = True
            elif (rand == 1):
                if (numParentheses == 0):
                    expr.append('(')
                    numParentheses += 1
                else:
                    rand = random.randint(0,1)
                    expr.append(parentheses[rand])
                    if rand == 0:
                        numParentheses += 1
                    else:
                        numParentheses -= 1
            isDiv = False
        numExpr -= 1

    fname.write(' '.join(expr) + '\n')
    fname.close()
    lines += 1

回答1:

Yes, you can generate random arithmetic expressions in a Pythonic way. You need to change your approach, though. Don't try to generate a string and count parens. Instead generate a random expression tree, then output that.

By an expression tree, I mean an instance of a class called, say, Expression with subclasses Number, PlusExpression,MinusExpression, 'TimesExpression, DivideExpression, and ParenthesizedExpression. Each of these, except Number will have fields of type Expression. Give each a suitable __str__ method. Generate some random expression objects and just print the "root."

Can you take it from here or would you like me to code it up?

ADDENDUM: Some sample starter code. Doesn't generate random expressions (yet?) but this can be added....

# This is just the very beginning of a script that can be used to process
# arithmetic expressions.  At the moment it just defines a few classes
# and prints a couple example expressions.

# Possible additions include methods to evaluate expressions and generate
# some random expressions.

class Expression:
    pass

class Number(Expression):
    def __init__(self, num):
        self.num = num

    def __str__(self):
        return str(self.num)

class BinaryExpression(Expression):
    def __init__(self, left, op, right):
        self.left = left
        self.op = op
        self.right = right

    def __str__(self):
        return str(self.left) + " " + self.op + " "  + str(self.right)

class ParenthesizedExpression(Expression):
    def __init__(self, exp):
        self.exp = exp

    def __str__(self):
        return "(" + str(self.exp) + ")"

e1 = Number(5)
print e1

e2 = BinaryExpression(Number(8), "+", ParenthesizedExpression(BinaryExpression(Number(7), "*", e1)))
print e2

** ADDENDUM 2 **

Getting back into Python is really fun. I couldn't resist implementing the random expression generator. It is built on the code above. SORRY ABOUT THE HARDCODING!!

from random import random, randint, choice

def randomExpression(prob):
    p = random()
    if p > prob:
        return Number(randint(1, 100))
    elif randint(0, 1) == 0:
        return ParenthesizedExpression(randomExpression(prob / 1.2))
    else:
        left = randomExpression(prob / 1.2)
        op = choice(["+", "-", "*", "/"])
        right = randomExpression(prob / 1.2)
        return BinaryExpression(left, op, right)

for i in range(10):
    print(randomExpression(1))

Here is the output I got:

(23)
86 + 84 + 87 / (96 - 46) / 59
((((49)))) + ((46))
76 + 18 + 4 - (98) - 7 / 15
(((73)))
(55) - (54) * 55 + 92 - 13 - ((36))
(78) - (7 / 56 * 33)
(81) - 18 * (((8)) * 59 - 14)
(((89)))
(59)

Ain't tooooo pretty. I think it puts out too many parents. Maybe change the probability of choosing between parenthesized expressions and binary expressions might work well....



回答2:

Actually, as long as Ray Toal's response is formally correct, for such a simple problem you don't have to subclass each operator*. I came up with the following code which works pretty well:

import random
import math


class Expression(object):
    OPS = ['+', '-', '*', '/']

    GROUP_PROB = 0.3

    MIN_NUM, MAX_NUM = 0, 20

    def __init__(self, maxNumbers, _maxdepth=None, _depth=0):
        """
        maxNumbers has to be a power of 2
        """
        if _maxdepth is None:
            _maxdepth = math.log(maxNumbers, 2) - 1

        if _depth < _maxdepth and random.randint(0, _maxdepth) > _depth:
            self.left = Expression(maxNumbers, _maxdepth, _depth + 1)
        else:
            self.left = random.randint(Expression.MIN_NUM, Expression.MAX_NUM)

        if _depth < _maxdepth and random.randint(0, _maxdepth) > _depth:
            self.right = Expression(maxNumbers, _maxdepth, _depth + 1)
        else:
            self.right = random.randint(Expression.MIN_NUM, Expression.MAX_NUM)

        self.grouped = random.random() < Expression.GROUP_PROB
        self.operator = random.choice(Expression.OPS)

    def __str__(self):
        s = '{0!s} {1} {2!s}'.format(self.left, self.operator, self.right)
        if self.grouped:
            return '({0})'.format(s)
        else:
            return s


for i in range(10):
    print Expression(4)

It can although be improved to take into considerations things like divisions by zero (not handled currently), customization of all parameters through attributes, allowing any value for the maxNumbers argument and so on.

* By "simple problem" I mean "generating valid expressions"; if you are adding any other functionality (for example, expression evaluation), then Ray's approach will pay of because you can define the behavior of each subclass in a much cleaner way.

Edit (output):

(5 * 12 / 16)
6 * 3 + 14 + 0
13 + 15 - 1
19 + (8 / 8)
(12 + 3 - 5)
(4 * 0 / 4)
1 - 18 / (3 * 15)
(3 * 16 + 3 * 1)
(6 + 16) / 16
(8 * 10)


回答3:

I found this thread on a similar quest, namely to generate random expressions for unit testing of symbolic calculations. In my version, I included unary functions and allowed the symbols to be arbitrary strings, i.e. you can use numbers or variable names.

from random import random, choice

UNARIES = ["sqrt(%s)", "exp(%s)", "log(%s)", "sin(%s)", "cos(%s)", "tan(%s)",
           "sinh(%s)", "cosh(%s)", "tanh(%s)", "asin(%s)", "acos(%s)",
           "atan(%s)", "-%s"]
BINARIES = ["%s + %s", "%s - %s", "%s * %s", "%s / %s", "%s ** %s"]

PROP_PARANTHESIS = 0.3
PROP_BINARY = 0.7

def generate_expressions(scope, num_exp, num_ops):
    scope = list(scope) # make a copy first, append as we go
    for _ in xrange(num_ops):
        if random() < PROP_BINARY: # decide unary or binary operator
            ex = choice(BINARIES) % (choice(scope), choice(scope))
            if random() < PROP_PARANTHESIS:
                ex = "(%s)" % ex
            scope.append(ex)
        else:
            scope.append(choice(UNARIES) % choice(scope))
    return scope[-num_exp:] # return most recent expressions

As copied from pervious answers, I just throw in some paranthesis around binary operators with probability PROP_PARANTHESIS (that is a bit of cheating). Binary operators are more common than unary ones, so I left it also for configuration (PROP_BINARY). An example code is:

scope = [c for c in "abcde"]
for expression in generate_expressions(scope, 10, 50):
    print expression

This will generate something like:

e / acos(tan(a)) / a * acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a)
(a + (a ** sqrt(e)))
acos((b / acos(tan(a)) / a + d) / (a ** sqrt(e)) * (a ** sinh(b) / b))
sin(atan(acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a)))
sin((b / acos(tan(a)) / a + d)) / (a ** sinh(b) / b)
exp(acos(tan(a)) / a + acos(e))
tan((b / acos(tan(a)) / a + d))
acos(tan(a)) / a * acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a) + cos(sqrt(e))
(acos(tan(a)) / a + acos(e) * a + e)
((b / acos(tan(a)) / a + d) - cos(sqrt(e))) + sinh(b)

Putting PROP_BINARY = 1.0 and applying with

scope = range(100)

brings us back to output like

43 * (50 * 83)
34 / (29 / 24)
66 / 47 - 52
((88 ** 38) ** 40)
34 / (29 / 24) - 27
(16 + 36 ** 29)
55 ** 95
70 + 28
6 * 32
(52 * 2 ** 37)


回答4:

Ok, I couldn't resist adding my own implementation using some of the ideas we discussed in Ray's answer. I approached a few things differently than Ray did.

I added some handling of the probability of the incidence of each operator. The operators are biased so that the lower priority operators (larger precedence values) are more common than the higher order ones.

I also implemented parentheses only when precedence requires. Since the integers have the highest priority (lowest precedence value) they never get wrapped in parentheses. There is no need for a parenthesized expression as a node in the expression tree.

The probability of using an operator is biased towards the initial levels (using a quadratic function) to get a nicer distribution of operators. Choosing a different exponent gives more potential control of the quality of the output, but I didn't play with the possibilities much.

I further implemented an evaluator for fun and also to filter out indeterminate expressions.

import sys
import random

# dictionary of operator precedence and incidence probability, with an
# evaluator added just for fun.
operators = {
    '^': {'prec': 10, 'prob': .1, 'eval': lambda a, b: pow(a, b)},
    '*': {'prec': 20, 'prob': .2, 'eval': lambda a, b: a*b},
    '/': {'prec': 20, 'prob': .2, 'eval': lambda a, b: a/b},
    '+': {'prec': 30, 'prob': .25, 'eval': lambda a, b: a+b},
    '-': {'prec': 30, 'prob': .25, 'eval': lambda a, b: a-b}}

max_levels = 3
integer_range = (-100, 100)
random.seed()

# A node in an expression tree
class expression(object):
    def __init__(self):
        super(expression, self).__init__()

    def precedence(self):
        return -1

    def eval(self):
        return 0

    @classmethod
    def create_random(cls, level):
        if level == 0:
            is_op = True
        elif level == max_levels:
            is_op = False
        else:
            is_op = random.random() <= 1.0 - pow(level/max_levels, 2.0)

        if is_op:
            return binary_expression.create_random(level)
        else:
            return integer_expression.create_random(level)

class integer_expression(expression):
    def __init__(self, value):
        super(integer_expression, self).__init__()

        self.value = value

    def __str__(self):
        return self.value.__str__()

    def precedence(self):
        return 0

    def eval(self):
        return self.value

    @classmethod
    def create_random(cls, level):
        return integer_expression(random.randint(integer_range[0],
                                                 integer_range[1]))

class binary_expression(expression):
    def __init__(self, symbol, left_expression, right_expression):
        super(binary_expression, self).__init__()

        self.symbol = symbol
        self.left = left_expression
        self.right = right_expression

    def eval(self):
        f = operators[self.symbol]['eval']
        return f(self.left.eval(), self.right.eval())

    @classmethod
    def create_random(cls, level):
        symbol = None

        # Choose an operator based on its probability distribution
        r = random.random()
        cumulative = 0.0
        for k, v in operators.items():
            cumulative += v['prob']
            if r <= cumulative:
                symbol = k
                break

        assert symbol != None

        left = expression.create_random(level + 1)
        right = expression.create_random(level + 1)

        return binary_expression(symbol, left, right)

    def precedence(self):
        return operators[self.symbol]['prec']

    def __str__(self):
        left_str = self.left.__str__()
        right_str = self.right.__str__()
        op_str = self.symbol

        # Use precedence to determine if we need to put the sub expressions in
        # parentheses
        if self.left.precedence() > self.precedence():
            left_str = '('+left_str+')'
        if self.right.precedence() > self.precedence():
            right_str = '('+right_str+')'

        # Nice to have space around low precedence operators
        if operators[self.symbol]['prec'] >= 30:
            op_str = ' ' + op_str + ' '

        return left_str + op_str + right_str

max_result = pow(10, 10)
for i in range(10):
    expr = expression.create_random(0)

    try:
        value = float(expr.eval())
    except:
        value = 'indeterminate'

    print expr, '=', value

I got these results:

(4 + 100)*41/46 - 31 - 18 - 2^-83 = -13.0
(43 - -77)/37^-94 + (-66*67)^(-24*49) = 3.09131533541e+149
-32 - -1 + 74 + 74 - 15 + 64 - -22/98 = 37.0
(-91*-4*45*-55)^(-9^2/(82 - -53)) = 1.0
-72*-85*(75 - 65) + -100*19/48*22 = 61198.0
-57 - -76 - -54*76 + -38 - -23 + -17 - 3 = 4088.0
(84*-19)^(13 - 87) - -10*-84*(-28 + -49) = 64680.0
-69 - -8 - -81^-51 + (53 + 80)^(99 - 48) = 2.07220963807e+108
(-42*-45)^(12/87) - -98 + -23 + -67 - -37 = 152.0
-31/-2*-58^-60 - 33 - -49 - 46/12 = -79.0

There are a couple of things the program does, that although are valid, a human wouldn't do. For example:

  1. It can create long strings of sequential divides (e.g. 1/2/3/4/5).
  2. +/- of a negative number is common (e.g. 1 - -2)

These can be corrected with a clean-up pass.

Also, there is no guarantee that the answer is determinate. Divides by 0 and 0^0 are possible, although with the exception handling these can be filtered out.



回答5:

import random

def expr(depth):
    if depth==1 or random.random()<1.0/(2**depth-1): 
        return str(int(random.random() * 100))
    return '(' + expr(depth-1) + random.choice(['+','-','*','/']) + expr(depth-1) + ')'

for i in range(10):
    print expr(4)


回答6:

Generate an array at random in RPN with mixtures of operators and numbers (always valid). Then start from middle of the array and generate the corresponding evaluation tree.