Time optimization of function for signal processin

I have a program doing a LOT of iteration (thousands to millions to hundreds of millions). It's starting to take quite a lot of time (few minutes, to a few days), and despite all my effort to optimize it, I'm still a bit stuck.

Profile: using cProfile via console call

ncalls     tottime  percall  cumtime  percall filename:lineno(function)
    500/1    0.018    0.000  119.860  119.860 {built-in method builtins.exec}
        1    0.006    0.006  119.860  119.860 Simulations_profiling.py:6(<module>)
      6/3    0.802    0.134  108.302   36.101 Simulations_profiling.py:702(optimization)
    38147    0.581    0.000  103.411    0.003 Simulations_profiling.py:270(overlap_duo_combination)
   107691   28.300    0.000  102.504    0.001 Simulations_profiling.py:225(duo_overlap)
 12615015   69.616    0.000   69.616    0.000 {built-in method builtins.round}

The first question, is what are the 2 first lines? I assumed that it was the program being called.

I'm going to replace the round() method by tolerance comparison in my if / else statements, thus avoiding this time consumption. I would like to optimize further the 2 following functions, but I can't find a new approach.

import itertools
import numpy as np

class Signal:
    def __init__(self, fq, t0, tf, width=0.3):
        self.fq = fq                                    # frequency in Hz
        self.width = float(width)                       # cathodic phase width in ms
        self.t0 = t0                                    # Instant of the first pulse in ms
        self.tf = tf                                    # End point in ms

        # List of instant at which a stim pulse is triggered in ms
        self.timeline = np.round(np.arange(t0, self.tf, 1/fq*1000), 3)

    def find_closest_t(self, t):
        val = min(self.timeline, key=lambda x:abs(x-t))
        id = np.where(self.timeline==val)[0][0]

        if val <= t or id == 0:
            return val, id
        else:
            return self.timeline[id-1], id-1

def duo_overlap(S1, S2, perc):

    pulse_id_t1, pulse_id_t2 = [], []

    start = max(S1.t0, S2.t0) - max(S1.width, S2.width)
    if start <= 0:
        start = 0
        start_id_S1 = 0
        start_id_S2 = 0
    else:
        start_id_S1 = S1.find_closest_t(start)[1]
        start_id_S2 = S2.find_closest_t(start)[1]

    stop = min(S1.tf, S2.tf) + max(S1.width, S2.width)

    for i in range(start_id_S1, len(S1.timeline)):
        if S1.timeline[i] > stop:
            break

        for j in range(start_id_S2, len(S2.timeline)):
            if S2.timeline[j] > stop:
                break

            d = round(abs(S2.timeline[j] - S1.timeline[i]), 3)  # Computation of the distance of the 2 point

            if S1.timeline[i] <= S2.timeline[j] and d < S1.width:
                pulse_id_t1.append(i)
                pulse_id_t2.append(j)
                continue

            elif S1.timeline[i] >= S2.timeline[j] and d < S2.width:
                pulse_id_t1.append(i)
                pulse_id_t2.append(j)
                continue

            else:
                continue

    return pulse_id_t1, pulse_id_t2

def overlap_duo_combination(signals, perc=0):

    overlap = dict()
    for i in range(len(signals)):
        overlap[i] = list()

    for subset in itertools.combinations(signals, 2):
        p1, p2 = duo_overlap(subset[0], subset[1], perc = perc)
        overlap[signals.index(subset[0])] += p1
        overlap[signals.index(subset[1])] += p2

    return overlap

Explanation of the program:

I have square signals of width Signal.width and of frequency Signal.fq starting at Signal.t0 and ending at Signal.tf. During the initialization of a Signal instance, the timeline is computed: it's a 1D-array of float in which each number corresponds to the instant at which a pulse is triggered.

Example:

IN: Signal(50, 0, 250).timeline
OUT: array([  0.,  20.,  40.,  60.,  80., 100., 120., 140., 160., 180., 200., 220., 240.])

A pulse is triggered at t = 0, t = 20, t = 40, ... Each pulse has a width of 0.3.

duo_overlap() takes 2 signals in input (and a percentage that we will keep fix at 0 for this example. This function computes the id of the the pulse for S1 and for S2 (ID in the timeline array) that overlap one with another.

Example:

If a pulse starts at t = 0 for S1 and a pulse starts at t = 0.2 for S2, since 0.2 - 0 = 0.2 < 0.3 (S1.width), the 2 pulses overlap.

I tried to optimize this function by looping only on the ID in which they can possibly overlap (start_id, stop) computed ahead.But as you can see, this function is still really time-consuming because of the high number of calls.

The last function, overlap_duo_combination() takes N signals in input as a list (or tuple / iterable) (2 <= N <= 6 in practice) and creates a dict() in which the key is the ID of the signal in the input list, and the value is a list of overlapping pulses ID (comparison 2 by 2 of the signals within the input list).

Example:

Input: signals = (S1, S2, S3) The pulse n°2 of S1 overlap with pulse n°3 of S2 and the pulse n°3 of S2 overlap with pulse n°5 of S3.

Output: dict[0] = [2] / dict[1] = [3, 3] / dict[2] = [5]

The 3 pops out twice for S2 because it will be add the first tile duo_overlap() is called on S1 and S2, and the second time when it is caleed on S2 and S3. I don't want to avoid duplicates since it's an information on how many different pulses are overlapping (in this case, 2 pulses are overlapping with the pulse n°3 of S2).

Would you have any idea, suggestion, code or whatever to redue the time complexity of this part of the code?

I am currently looking into PyCUDA implementation since I have a Nvidia 1080 Ti at disposal, but I don't know the C language. Would it be worth to switch to GPU this inner function that doesn't take long to execute when called but is called thousands of times?

Thanks for reading such a long post, and thanks for the help!

If I understood your problem correctly, you could speed up your duo_overlap() function by relying on numpy instead of performing all the loops.

You would like to subtract all values of the S2.timeline from the S1.timeline and compare the difference to the width of the signal. The following function repeats the S1.timeline (repeats as columns) and subtracts S2.timeline from each row. Thus, the row indices correspond to S1.timeline, the column indices correspond to S2.timeline.

def duo_overlap_np(S1, S2):
    x = S1.timeline
    y = S2.timeline

    mat = np.repeat(x, y.shape[0]).reshape(x.shape[0],y.shape[0])
    mat = mat - y

    # End of S1 pulse overlaps with start of S2 pulse
    overlap_1 = np.where((0 >= mat) & (mat >= -S1.width))

    # End of S2 pulse overlaps with start of S1 pulse
    overlap_2 = np.where((0 <= mat) & (mat <= S2.width))

    # Flatten the overlap arrays. The first element returned by np.where
    # corresponds to S1, the second to S2
    S1_overlap = np.concatenate([overlap_1[0], overlap_2[0]])
    S2_overlap = np.concatenate([overlap_1[1], overlap_2[1]])

    return S1_overlap, S2_overlap

A quick speed comparison on my machine gives,

S1 = Signal(50, 0, 1000, width=0.3)
S2 = Signal(25.5, 20.2, 1000, width=0.6)

%timeit duo_overlap(S1, S2, 0)
# 7 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit duo_overlap_np(S1, S2)
# 38.2 µs ± 2.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)