Directly “plot” line segments to numpy array

One of my first projects realized in python does Monte Carlo simulation of stick percolation. The code grew continually. The first part was the visualization of the stick percolation. In an area of width*length a defined density (sticks/area) of straight sticks with a certain length are plotted with random start coordinates and direction. As I often use gnuplot I wrote the generated (x, y) start and end coordinates to a text file to gnuplot them afterwards.

I then found here a nice way to analyze the image data using scipy.ndimage.measurements. The image is read by using ndimage.imread in greyscales. The resulting numpy array is further reduced to boolean values as I am only interested in connections between different sticks. The resulting clusters are then analyzed with ndimage.measurements. This allows me to find out if there are pathways that connect from one side to the other or not. A minimized example is here.

import random
import math
from scipy.ndimage import measurements
from scipy.ndimage import imread
import numpy as np
import matplotlib.pyplot as plt

#dimensions of plot
width = 10
length = 8
stick_length = 1
fig = plt.figure(frameon=False)
ax = fig.add_axes([0, 0, 1, 1])
fig.set_figwidth(width)
fig.set_figheight(length)
ax.axis('off')

file = open("coordinates.txt", "w")

for i in range (300):
    # randomly create (x,y) start coordinates in channel and direction
    xstart = width * random.random()        # xstart = 18
    ystart = length * random.random()        # ystart = 2
    # randomly generate direction of stick from start coordinates and convert from GRAD in RAD
    dirgrad = 360 * random.random()
    dirrad = math.radians(dirgrad)
    # calculate (x,y) end coordinates
    xend = xstart + (math.cos(dirrad) * stick_length)
    yend = ystart + (math.sin(dirrad) * stick_length)
    # write start and end coordinates into text file for gnuplot plotting
    file.write(str(i) + ":\t" + str(xstart) + "\t" + str(ystart) + "\t" + str(dirgrad) + ":\t" + str(xend) + "\t" + str(yend) + "\n")
    file.write(str(i) + ":\t" + str(xend) + "\t" + str(yend) + "\n\n")
    # or plot directly with matplotlib
    ax.plot([xstart,xend],[ystart,yend],"black", lw=1)
fig.savefig("testimage.png", dpi=100)

# now read just saved image and do analysis with scipy.ndimage
fig1, ax1 = plt.subplots(1,1)
img_input = imread("testimage.png", flatten = True) # read image to np.ndarray in grey scales
img_bw = img_input < 255 # convert grey scales to b/w (boolean)
labeled_array, num_clusters = measurements.label(img_bw) #labeled_array: labeled clusters in array, num_clusters: number of clusters
area = measurements.sum(img_bw, labeled_array, index=np.arange(labeled_array.max() + 1)) # area of each cluster
areaImg = area[labeled_array] # label each cluster with labelnumber=area
cax = ax1.imshow(areaImg, origin='upper', interpolation='nearest', cmap = 'rainbow')
cbar = fig1.colorbar(cax)
fig1.savefig("testimage_analyzed.png")

While this works principally just fine Monte Carlo simulations with 1000 iterations for a larger number of different stick densities end up running 8 hours or more. This is partly due to the fact, that the created images & arrays are quite large and thousands of sticks are plotted for higher densities. The reason is that I want to simulate a range of geometries (eg length between 500 and 20000 pixels) while minimizing the error due to the pixelization.

I guess the best way would be not to use image data and treat it as a vector problem, but I have no idea how to even start an algorithm. And the many connections might result in large data arrays as well.

Staying with the above described method it is obvious that writing data to a file and re-reading it is not very effective. I am therefore looking for ways to speed this up. As a first step I used matplotlib to create the image however at least when plotting each stick with a separate plot call this is up to 10x slower for a larger number of sticks. Creating the list of stick coordinates in an array and plotting the complete list with one plot call might speed it up but still leaves the bottleneck of writing and reading the image.

Can you point me to an effective method to directly generate the boolean type numpy array representing the black and white image of the sticks? Maybe plot the list of coordinates and convert the figure in some way to an array? I also found this interesting discussion where lines are plotted to a PIL image. Might this be faster than matplotlib?

Drawing a line segment in an array is a fundamental capability of any graphics library. The simplest method is probably Bresenham's algorithm. The algorithm is simple and fast--when implemented in a fast language, that is. I wouldn't recommend implementing it in pure python. A drawback of the simplest version of the algorithm is that it is not anti-aliased. The lines show "jaggies". Search for "line drawing algorithms" for more advanced methods with better anti-aliasing.

I have a Cython implementation of Bresenham's algorithm in my eyediagram package. The function bres_segment_count increments the values in the input array along the straight line from (x0, y0) to (x1, y1). A modification that simply sets the array values to 1 would be a trivial change to that code.

For example,

In [21]: dim = 250

In [22]: num_sticks = 300

Each row of sticks contains [x0, y0, x1, y1], the end points of a "stick":

In [23]: sticks = np.random.randint(0, dim, size=(num_sticks, 4)).astype(np.int32)

In [24]: img = np.zeros((dim, dim), dtype=np.int32)

bres_segments_count draws each stick using Bresenham's algorithm. Note that instead of simply setting a value in the line to, say, 1, the values in img along the line are incremented.

In [25]: from eyediagram._brescount import bres_segments_count

In [26]: bres_segments_count(sticks, img)

In [27]: plt.imshow(img, interpolation='nearest', cmap=cm.hot)
Out[27]: <matplotlib.image.AxesImage at 0x10f94b110>

Here's the plot that is generated: