I made a function to evaluate the following problem experimentally, taken from a A Primer for the Mathematics of Financial Engineering.
Problem: Let X be the number of times you must flip a fair coin until it lands heads. What are E[X] (expected value) and var(X) (variance)?
Following the textbook solution, the following code yields the correct answer:
from sympy import *
k = symbols('k')
Expected_Value = summation(k/2**k, (k, 1, oo)) # Both solutions work
Variance = summation(k**2/2**k, (k, 1, oo)) - Expected_Value**2
To validate this answer, I decided to have a go at making a function to simulate this experiment. The following code is what I came up with.
def coin_toss(toss, probability=[0.5, 0.5]):
"""Computes expected value and variance for coin toss experiment"""
flips = [] # Collects total number of throws until heads appear per experiment.
for _ in range(toss): # Simulate n flips
number_flips=[] # Number of flips until heads is tossed
while sum(number_flips) == 0: # Continue simulation while Tails are thrown
number_flips.append(np.random.choice(2, p=probability)) # Append result to number_flips
flips.append(len(number_flips)) #Append number of flips until lands heads to flips
Expected_Value, Variance = np.mean(flips), np.var(flips)
print('E[X]: {}'.format(Expected_Value),
'\nvar[X]: {}'.format(Variance)) # Return expected value
The run time if I simulate 1e6 experiments, using the following code is approximately 35.9 seconds.
from timeit import Timer
t1 = Timer("""coin_toss(1000000)""", """from __main__ import coin_toss""")
print(t1.timeit(1))
In the interest of developing my understanding of Python, is this a particularly efficient/pythonic way of approaching a problem like this? How can I utilise existing libraries to improve efficiency/flow execution?
In order to code in an efficient and pythonic way, you must take a look at PythonSpeed and NumPy. One exemple of a faster code using numpy can be found below.
The abc of optimizing in python+numpy is to vectorize operations, which in this case is quite dificult because there is a
while
that could actually be infinite, the coin can be flipped tails 40 times in a row. However, instead of doing afor
withtoss
iterations, the work can be done in chunks. That is the main difference betweencoin_toss
from the question and thiscoin_toss_2d
approach.coin_toss_2d
The main advantatge of
coin_toss_2d
is working by chunks, the size of these chunks has some default values, but they can be modified (as they will affect speed). Thus, it will only iterate over thewhile current_toss<toss
a number of timestoss%flips_at_a_time
. This is achieved with numpy, which allows to generate a matrix with the results of repeatingflips_at_a_time
times the experiment of flipping a coinflips_per_try
times. This matrix will contain 0 (tails) and 1 (heads).Once this result is obtained,
argmax
is called. This finds the index corresponding to the maximum (which will be 1, a head) of each row (repetition) and in case of multiple occurences, returns the first one, which is exactly what is needed, the first head after a sequence of tails.However, the case where all the row is 0 must be considered. In this case, the maximum will be 0, and its first occurence will also be 0, the first column (try). Therefore, we check that all the maximums found at the first try correspond to a head at the first try.
If that is not the case, we loop repeating that same process but only for the repetitions where there was no head in any of the tries.
Then the indexes corresponding to the first head occurence are stored (we have to add 1 because python indexing starts at zero instead of 1). There is one
try ... except ...
block to cope with cases wheretoss
is not a multiple ofrepetitions_at_a_time
.coin_toss_map
Here the code is basically the same, but now, the intrinsec while is done in a separate function, which is called from the wrapper function
coin_toss_map
usingmap
.Performance comparison
In my computer, I got the following computation time:
And the results for the mean and variance are: