I have a program doing a LOT of iteration (thousands to millions to hundreds of millions). It's starting to take quite a lot of time (few minutes, to a few days), and despite all my effort to optimize it, I'm still a bit stuck.
Profile: using cProfile via console call
ncalls tottime percall cumtime percall filename:lineno(function)
500/1 0.018 0.000 119.860 119.860 {built-in method builtins.exec}
1 0.006 0.006 119.860 119.860 Simulations_profiling.py:6(<module>)
6/3 0.802 0.134 108.302 36.101 Simulations_profiling.py:702(optimization)
38147 0.581 0.000 103.411 0.003 Simulations_profiling.py:270(overlap_duo_combination)
107691 28.300 0.000 102.504 0.001 Simulations_profiling.py:225(duo_overlap)
12615015 69.616 0.000 69.616 0.000 {built-in method builtins.round}
The first question, is what are the 2 first lines? I assumed that it was the program being called.
I'm going to replace the round()
method by tolerance comparison in my if / else
statements, thus avoiding this time consumption. I would like to optimize further the 2 following functions, but I can't find a new approach.
import itertools
import numpy as np
class Signal:
def __init__(self, fq, t0, tf, width=0.3):
self.fq = fq # frequency in Hz
self.width = float(width) # cathodic phase width in ms
self.t0 = t0 # Instant of the first pulse in ms
self.tf = tf # End point in ms
# List of instant at which a stim pulse is triggered in ms
self.timeline = np.round(np.arange(t0, self.tf, 1/fq*1000), 3)
def find_closest_t(self, t):
val = min(self.timeline, key=lambda x:abs(x-t))
id = np.where(self.timeline==val)[0][0]
if val <= t or id == 0:
return val, id
else:
return self.timeline[id-1], id-1
def duo_overlap(S1, S2, perc):
pulse_id_t1, pulse_id_t2 = [], []
start = max(S1.t0, S2.t0) - max(S1.width, S2.width)
if start <= 0:
start = 0
start_id_S1 = 0
start_id_S2 = 0
else:
start_id_S1 = S1.find_closest_t(start)[1]
start_id_S2 = S2.find_closest_t(start)[1]
stop = min(S1.tf, S2.tf) + max(S1.width, S2.width)
for i in range(start_id_S1, len(S1.timeline)):
if S1.timeline[i] > stop:
break
for j in range(start_id_S2, len(S2.timeline)):
if S2.timeline[j] > stop:
break
d = round(abs(S2.timeline[j] - S1.timeline[i]), 3) # Computation of the distance of the 2 point
if S1.timeline[i] <= S2.timeline[j] and d < S1.width:
pulse_id_t1.append(i)
pulse_id_t2.append(j)
continue
elif S1.timeline[i] >= S2.timeline[j] and d < S2.width:
pulse_id_t1.append(i)
pulse_id_t2.append(j)
continue
else:
continue
return pulse_id_t1, pulse_id_t2
def overlap_duo_combination(signals, perc=0):
overlap = dict()
for i in range(len(signals)):
overlap[i] = list()
for subset in itertools.combinations(signals, 2):
p1, p2 = duo_overlap(subset[0], subset[1], perc = perc)
overlap[signals.index(subset[0])] += p1
overlap[signals.index(subset[1])] += p2
return overlap
Explanation of the program:
I have square signals of width Signal.width
and of frequency Signal.fq
starting at Signal.t0
and ending at Signal.tf
. During the initialization of a Signal
instance, the timeline
is computed: it's a 1D-array of float in which each number corresponds to the instant at which a pulse is triggered
.
Example:
IN: Signal(50, 0, 250).timeline
OUT: array([ 0., 20., 40., 60., 80., 100., 120., 140., 160., 180., 200., 220., 240.])
A pulse is triggered at t = 0, t = 20, t = 40, ... Each pulse has a width of 0.3.
duo_overlap()
takes 2 signals in input (and a percentage that we will keep fix at 0 for this example. This function computes the id of the the pulse for S1 and for S2 (ID in the timeline array) that overlap one with another.
Example:
If a pulse starts at t = 0 for S1 and a pulse starts at t = 0.2 for S2, since 0.2 - 0 = 0.2 < 0.3 (S1.width), the 2 pulses overlap.
I tried to optimize this function by looping only on the ID in which they can possibly overlap (start_id
, stop
) computed ahead.But as you can see, this function is still really time-consuming because of the high number of calls.
The last function, overlap_duo_combination()
takes N signals in input as a list (or tuple / iterable) (2 <= N <= 6 in practice) and creates a dict()
in which the key is the ID of the signal in the input list, and the value is a list of overlapping pulses ID (comparison 2 by 2 of the signals within the input list).
Example:
Input: signals = (S1, S2, S3) The pulse n°2 of S1 overlap with pulse n°3 of S2 and the pulse n°3 of S2 overlap with pulse n°5 of S3.
Output: dict[0] = [2] / dict[1] = [3, 3] / dict[2] = [5]
The 3 pops out twice for S2 because it will be add the first tile duo_overlap()
is called on S1 and S2, and the second time when it is caleed on S2 and S3.
I don't want to avoid duplicates since it's an information on how many different pulses are overlapping (in this case, 2 pulses are overlapping with the pulse n°3 of S2).
Would you have any idea, suggestion, code or whatever to redue the time complexity of this part of the code?
I am currently looking into PyCUDA implementation since I have a Nvidia 1080 Ti at disposal, but I don't know the C language. Would it be worth to switch to GPU this inner function that doesn't take long to execute when called but is called thousands of times?
Thanks for reading such a long post, and thanks for the help!