I'm currently trying to create a Python script that will autogenerate space-delimited arithmetic expressions which are valid. However, I get sample output that looks like this: ( 32 - 42 / 95 + 24 ( ) ( 53 ) + ) 21
While the empty parentheses are perfectly OK by me, I can't use this autogenerated expression in calculations since there's no operator between the 24 and the 53, and the + before the 21 at the end has no second argument.
What I want to know is, is there a way to account for/fix these errors using a Pythonic solution? (And before anyone points it out, I'll be the first to acknowledge that the code I posted below is probably the worst code I've pushed and conforms to...well, very few of Python's core tenets.)
import random
parentheses = ['(',')']
ops = ['+','-','*','/'] + parentheses
lines = 0
while lines < 1000:
fname = open('test.txt','a')
expr = []
numExpr = lines
if (numExpr % 2 == 0):
numExpr += 1
isDiv = False # Boolean var, makes sure there's no Div by 0
# isNumber, isParentheses, isOp determine whether next element is a number, parentheses, or operator, respectively
isNumber = random.randint(0,1) == 0 # determines whether to start sequence with number or parentheses
isParentheses = not isNumber
isOp = False
# Counts parentheses to ensure parentheses are matching
numParentheses = 0
while (numExpr > 0 or numParentheses > 0):
if (numExpr < 0 and numParentheses > 0):
isDiv = False
expr.append(')')
numParentheses -= 1
elif (isOp and numParentheses > 0):
rand = random.randint(0,5)
expr.append(ops[rand])
isDiv = (rand == 3) # True if div op was just appended
# Checks to see if ')' was appended
if (rand == 5):
isNumber = False
isOp = True
numParentheses -= 1
# Checks to see if '(' was appended
elif (rand == 4):
isNumber = True
isOp = False
numParentheses += 1
# All other operations go here
else:
isNumber = True
isOp = False
# Didn't add parentheses possibility here in case expression in parentheses somehow reaches 0
elif (isNumber and isDiv):
expr.append(str(random.randint(1,100)))
isDiv = False
isNumber = False
isOp = True
# If a number's up, decides whether to append parentheses or a number
elif (isNumber):
rand = random.randint(0,1)
if (rand == 0):
expr.append(str(random.randint(0,100)))
isNumber = False
isOp = True
elif (rand == 1):
if (numParentheses == 0):
expr.append('(')
numParentheses += 1
else:
rand = random.randint(0,1)
expr.append(parentheses[rand])
if rand == 0:
numParentheses += 1
else:
numParentheses -= 1
isDiv = False
numExpr -= 1
fname.write(' '.join(expr) + '\n')
fname.close()
lines += 1
Yes, you can generate random arithmetic expressions in a Pythonic way. You need to change your approach, though. Don't try to generate a string and count parens. Instead generate a random expression tree, then output that.
By an expression tree, I mean an instance of a class called, say,
Expression
with subclassesNumber
,PlusExpression,
MinusExpression, 'TimesExpression
,DivideExpression
, andParenthesizedExpression
. Each of these, exceptNumber
will have fields of typeExpression
. Give each a suitable__str__
method. Generate some random expression objects and just print the "root."Can you take it from here or would you like me to code it up?
ADDENDUM: Some sample starter code. Doesn't generate random expressions (yet?) but this can be added....
** ADDENDUM 2 **
Getting back into Python is really fun. I couldn't resist implementing the random expression generator. It is built on the code above. SORRY ABOUT THE HARDCODING!!
Here is the output I got:
Ain't tooooo pretty. I think it puts out too many parents. Maybe change the probability of choosing between parenthesized expressions and binary expressions might work well....
Generate an array at random in RPN with mixtures of operators and numbers (always valid). Then start from middle of the array and generate the corresponding evaluation tree.
I found this thread on a similar quest, namely to generate random expressions for unit testing of symbolic calculations. In my version, I included unary functions and allowed the symbols to be arbitrary strings, i.e. you can use numbers or variable names.
As copied from pervious answers, I just throw in some paranthesis around binary operators with probability
PROP_PARANTHESIS
(that is a bit of cheating). Binary operators are more common than unary ones, so I left it also for configuration (PROP_BINARY
). An example code is:This will generate something like:
Putting
PROP_BINARY = 1.0
and applying withbrings us back to output like
Actually, as long as Ray Toal's response is formally correct, for such a simple problem you don't have to subclass each operator*. I came up with the following code which works pretty well:
It can although be improved to take into considerations things like divisions by zero (not handled currently), customization of all parameters through attributes, allowing any value for the
maxNumbers
argument and so on.* By "simple problem" I mean "generating valid expressions"; if you are adding any other functionality (for example, expression evaluation), then Ray's approach will pay of because you can define the behavior of each subclass in a much cleaner way.
Edit (output):
Ok, I couldn't resist adding my own implementation using some of the ideas we discussed in Ray's answer. I approached a few things differently than Ray did.
I added some handling of the probability of the incidence of each operator. The operators are biased so that the lower priority operators (larger precedence values) are more common than the higher order ones.
I also implemented parentheses only when precedence requires. Since the integers have the highest priority (lowest precedence value) they never get wrapped in parentheses. There is no need for a parenthesized expression as a node in the expression tree.
The probability of using an operator is biased towards the initial levels (using a quadratic function) to get a nicer distribution of operators. Choosing a different exponent gives more potential control of the quality of the output, but I didn't play with the possibilities much.
I further implemented an evaluator for fun and also to filter out indeterminate expressions.
I got these results:
There are a couple of things the program does, that although are valid, a human wouldn't do. For example:
These can be corrected with a clean-up pass.
Also, there is no guarantee that the answer is determinate. Divides by 0 and 0^0 are possible, although with the exception handling these can be filtered out.