Counting data points within limits, and applying b

2019-08-08 10:45发布

问题:

I am stuck trying to solve this problem:

I have a set of data points, that correspond to a set of time values. i.e. values =[1,2,3,4,5,6,7,8,4] times = [0.1,0.2,0.3,0.4]... and so on, it's a sample rate of 10hz.

I need to find the time spent between two limits. so for example if my limits are 3 and 5 inclusive then 3,4,5,4 are in my limits.

if I calculate the time as number of (points -1) /sample rate, or the start time and end time of the grouping, I will miss out the isolated data point (the second 4)

I have proposed to use an imaginary buffer of half the sample rate either side to model the isolated events.

but I am having trouble implementing this in code (python) I need a way of iterating through the points, counting the number of points, with a pairing to be able to perform (number of points-1)/ sample rate. and also how to pick up the singular points and apply the buffer value to the number of points.

I can't seem to find the right combination of if and while statements to do this.

回答1:

Here's a function that does what you want. Runs of multiple data points that are within the specified limits are given a time value equal to the number of data points times the sampling period (i.e. the reciprocal of the sampling frequency), isolated single points are given a value of half the sampling period.

#!/usr/bin/env python

''' Estimate time of data points falling within specified limits 
    From http://stackoverflow.com/q/29430625/4014959
    Written 2015.04.03 by PM 2Ring,
    with help from Antti Haapala and Martijn Pieters 
'''

from itertools import groupby

def estimate_time(values, lo_lim, hi_lim, sample_rate):
    #Find values that are in range
    in_range = [lo_lim <= v <= hi_lim for v in values]

    #Find runs of in-range values
    runs = [sum(1 for _ in group) for v, group in groupby(in_range) if v]

    #Estimate total time spent in-range
    total_time = sum(v if v > 1 else 0.5 for v in runs)
    return total_time / sample_rate


values = [1, 2, 3, 4, 5, 6, 7, 8, 4]
sample_rate = 10.0  # in Hz

lo_lim = 3
hi_lim = 5

print estimate_time(values, lo_lim, hi_lim, sample_rate)

output

0.35

To check that this code really does what you want you can put some print statements into estimate_time() to show the contents of in_range and runs.


One thing you can do to reduce memory requirements is to convert the list comprehensions into generator expressions. List comprehensions have to create a whole new list in memory (which is deleted once it goes out of scope); a generator expression is a little slower, but it doesn't need to build a list - results are generated as they're needed. The syntax is very similar - just replace the square brackets of the list comp with round brackets to turn it into a gen exp.

So change

in_range = [lo_lim <= v <= hi_lim for v in values]
to
in_range = (lo_lim <= v <= hi_lim for v in values)

and

runs = [sum(1 for _ in group) for v, group in groupby(in_range) if v]
to
runs = (sum(1 for _ in group) for v, group in groupby(in_range) if v)