I have a text file with time stamps and labels like this :
0.000000 14.463912 tone
14.476425 16.891247 noise
16.891247 21.232923 not_music
21.232923 23.172289 not_music
23.172289 29.128018 not_music
If I specify a step size of 1 second. I want this list to explode into time frames of 1 second long duration but still carry the nearest label. How do I explode the time ranges into smaller steps but with accurate labels?
for example if my step were 1 second, then the first line would become ~ 14 lines like :
0.0 1.0 tone
1.0 2.0 tone
.
.
.
13.0 14.0 tone
[14.0 , 14.46] and [14.47, 15.0] #fall in a grey zone , don't know
what to do
15.0 16.0 noise
So far I have managed to read in the text file and store them in a list like:
my_segments =[]
for line in open('./data/annotate.txt', 'rb').readlines():
start, end, label = line.split("\t")
start = float(start)
end = float(end)
label = label.strip()
my_segments.append((start, end, label))
# print my_segments
for i in range(len(my_segments)):
print my_segments[i]
I looked at https://stackoverflow.com/a/18265979/4932791 by @Jared which details how to create a range between two numbers with a given step size using numpy. like so :
>>> numpy.arange(11, 17, 0.5)
array([ 11. , 11.5, 12. , 12.5, 13. , 13.5, 14. , 14.5, 15. ,
15.5, 16. , 16.5])
Unable to figure out how to do something similar on a range of ranges.
Pseudocode/algorithm I managed to come up with is :
- step 1- take a step size,
- step 2- assign step size to a left_variable and a right_variable corresponding to the step size
step 3- move this step like window over the each range and check if the step falls within the range or not, If it does then assign it the corresponding label. - step 4- now update the left and right by 1 step.
- step 5- repeat from step 3 till end of file is reached.
I think to handle edge cases, I should reduce step size to 0.25 seconds or something like that and put a condition if the current step has atleast 40 or 50% overlap then I assign the label accordingly.
Update : my non working solution :
sliding_window = 0
#st,en = [0.0,1.0]
jumbo= []
for i in range(len(hold_segments)):
if sliding_window > hold_segments[i][0] and sliding_window+1 < hold_segments[i][1]:
jumbo.append((sliding_window,sliding_window+1,hold_segments[i][2]))
sliding_window=sliding_window+1
print hold_segments[i][2]
With pandas that's quite straight-forward, assuming you've loaded your data into a dataframe called
df
such as:Then restore the ranges with:
I hope with the comments it is clear what the code does. Works also well for non-integer stepsize