I am stuck trying to solve this problem:
I have a set of data points, that correspond to a set of time values. i.e. values =[1,2,3,4,5,6,7,8,4] times = [0.1,0.2,0.3,0.4]...
and so on, it's a sample rate of 10hz.
I need to find the time spent between two limits. so for example if my limits are 3 and 5 inclusive then 3,4,5,4 are in my limits.
if I calculate the time as number of (points -1) /sample rate
, or the start time and end time of the grouping, I will miss out the isolated data point (the second 4)
I have proposed to use an imaginary buffer of half the sample rate either side to model the isolated events.
but I am having trouble implementing this in code (python)
I need a way of iterating through the points, counting the number of points, with a pairing to be able to perform (number of points-1)/ sample rate. and also how to pick up the singular points and apply the buffer value to the number of points.
I can't seem to find the right combination of if
and while
statements to do this.
Here's a function that does what you want. Runs of multiple data points that are within the specified limits are given a time value equal to the number of data points times the sampling period (i.e. the reciprocal of the sampling frequency), isolated single points are given a value of half the sampling period.
#!/usr/bin/env python
''' Estimate time of data points falling within specified limits
From http://stackoverflow.com/q/29430625/4014959
Written 2015.04.03 by PM 2Ring,
with help from Antti Haapala and Martijn Pieters
'''
from itertools import groupby
def estimate_time(values, lo_lim, hi_lim, sample_rate):
#Find values that are in range
in_range = [lo_lim <= v <= hi_lim for v in values]
#Find runs of in-range values
runs = [sum(1 for _ in group) for v, group in groupby(in_range) if v]
#Estimate total time spent in-range
total_time = sum(v if v > 1 else 0.5 for v in runs)
return total_time / sample_rate
values = [1, 2, 3, 4, 5, 6, 7, 8, 4]
sample_rate = 10.0 # in Hz
lo_lim = 3
hi_lim = 5
print estimate_time(values, lo_lim, hi_lim, sample_rate)
output
0.35
To check that this code really does what you want you can put some print statements into estimate_time()
to show the contents of in_range
and runs
.
One thing you can do to reduce memory requirements is to convert the list comprehensions into generator expressions. List comprehensions have to create a whole new list in memory (which is deleted once it goes out of scope); a generator expression is a little slower, but it doesn't need to build a list - results are generated as they're needed. The syntax is very similar - just replace the square brackets of the list comp with round brackets to turn it into a gen exp.
So change
in_range = [lo_lim <= v <= hi_lim for v in values]
to
in_range = (lo_lim <= v <= hi_lim for v in values)
and
runs = [sum(1 for _ in group) for v, group in groupby(in_range) if v]
to
runs = (sum(1 for _ in group) for v, group in groupby(in_range) if v)