One of my first projects realized in python does Monte Carlo simulation of stick percolation. The code grew continually. The first part was the visualization of the stick percolation. In an area of width*length a defined density (sticks/area) of straight sticks with a certain length are plotted with random start coordinates and direction. As I often use gnuplot I wrote the generated (x, y) start and end coordinates to a text file to gnuplot them afterwards.
I then found here a nice way to analyze the image data using scipy.ndimage.measurements. The image is read by using ndimage.imread in greyscales. The resulting numpy array is further reduced to boolean values as I am only interested in connections between different sticks. The resulting clusters are then analyzed with ndimage.measurements. This allows me to find out if there are pathways that connect from one side to the other or not. A minimized example is here.
import random
import math
from scipy.ndimage import measurements
from scipy.ndimage import imread
import numpy as np
import matplotlib.pyplot as plt
#dimensions of plot
width = 10
length = 8
stick_length = 1
fig = plt.figure(frameon=False)
ax = fig.add_axes([0, 0, 1, 1])
fig.set_figwidth(width)
fig.set_figheight(length)
ax.axis('off')
file = open("coordinates.txt", "w")
for i in range (300):
# randomly create (x,y) start coordinates in channel and direction
xstart = width * random.random() # xstart = 18
ystart = length * random.random() # ystart = 2
# randomly generate direction of stick from start coordinates and convert from GRAD in RAD
dirgrad = 360 * random.random()
dirrad = math.radians(dirgrad)
# calculate (x,y) end coordinates
xend = xstart + (math.cos(dirrad) * stick_length)
yend = ystart + (math.sin(dirrad) * stick_length)
# write start and end coordinates into text file for gnuplot plotting
file.write(str(i) + ":\t" + str(xstart) + "\t" + str(ystart) + "\t" + str(dirgrad) + ":\t" + str(xend) + "\t" + str(yend) + "\n")
file.write(str(i) + ":\t" + str(xend) + "\t" + str(yend) + "\n\n")
# or plot directly with matplotlib
ax.plot([xstart,xend],[ystart,yend],"black", lw=1)
fig.savefig("testimage.png", dpi=100)
# now read just saved image and do analysis with scipy.ndimage
fig1, ax1 = plt.subplots(1,1)
img_input = imread("testimage.png", flatten = True) # read image to np.ndarray in grey scales
img_bw = img_input < 255 # convert grey scales to b/w (boolean)
labeled_array, num_clusters = measurements.label(img_bw) #labeled_array: labeled clusters in array, num_clusters: number of clusters
area = measurements.sum(img_bw, labeled_array, index=np.arange(labeled_array.max() + 1)) # area of each cluster
areaImg = area[labeled_array] # label each cluster with labelnumber=area
cax = ax1.imshow(areaImg, origin='upper', interpolation='nearest', cmap = 'rainbow')
cbar = fig1.colorbar(cax)
fig1.savefig("testimage_analyzed.png")
While this works principally just fine Monte Carlo simulations with 1000 iterations for a larger number of different stick densities end up running 8 hours or more. This is partly due to the fact, that the created images & arrays are quite large and thousands of sticks are plotted for higher densities. The reason is that I want to simulate a range of geometries (eg length between 500 and 20000 pixels) while minimizing the error due to the pixelization.
I guess the best way would be not to use image data and treat it as a vector problem, but I have no idea how to even start an algorithm. And the many connections might result in large data arrays as well.
Staying with the above described method it is obvious that writing data to a file and re-reading it is not very effective. I am therefore looking for ways to speed this up. As a first step I used matplotlib to create the image however at least when plotting each stick with a separate plot call this is up to 10x slower for a larger number of sticks. Creating the list of stick coordinates in an array and plotting the complete list with one plot call might speed it up but still leaves the bottleneck of writing and reading the image.
Can you point me to an effective method to directly generate the boolean type numpy array representing the black and white image of the sticks? Maybe plot the list of coordinates and convert the figure in some way to an array? I also found this interesting discussion where lines are plotted to a PIL image. Might this be faster than matplotlib?