Recurrence plot in python

2019-07-18 18:30发布

问题:

I am trying to clusterize paterns in time series as I ask in

How to clustering syllable types with python?

I try using to solve my problem using the recurrence plots technique, so I make some code in python to reproduce these plots. I want o know if my code is ok, I tried it with a sound temporal series and I am getting this kind of result depending on the distance parameter value:

http://ceciliajarne.web.unq.edu.ar/envelope-problem/

Also I include the data set. I am using ch2. This is my code:

import numpy as np
import scipy
import os

from scipy.io import wavfile
import wave, struct
import matplotlib.pyplot as pp

from pylab import *

import scipy.signal.signaltools as sigtool
import scipy, pylab
from scipy.io import wavfile
import wave, struct
import scipy.signal as signal
from scipy.fftpack import fft

 #Data set input
data=np.random.rand(44000*3) 
#random secuence to compare with almost 3 seconds of data, cold be other
print 'data:', data

#set size 
sissse=data.size
print 'size: ',sissse
print '---------------'

#empty vectors 

x_filt_all_p=[]
y_filt_all_p=[]
los_p_filt_all_p=[]

#creating the list to fill 
dif=[]
dif_abs=[]
p=1

#for each i-element of data vector for each p

for p in range(1,sissse,4400):
    for i in enumerate(data):
        #print i
        j=i[0]
        #print 'j: ',j
        if (j<sissse-p):
            dif_aux=data[j+p]-data[j]
            #print 'dif=',dif_aux
            dif.append(dif_aux)
                dif_abs.append(abs(data[j+p]-data[j]))      
            #print'.........'

    print'.........'
    #print 'dif=',dif
    print'.........'
    #print 'Absolute difference=',dif_abs
    print'.........'

    #vector with index and diferences in absolute value

    pepe= np.vstack([np.arange(len(dif_abs)),dif_abs])

    print 'pepe0: ', pepe[0]
    xx=pepe[0]
    print 'pepe1: ', pepe[1]
    yy=pepe[1]

    #filtering the elements with diference<delta

    delta= 0.001

    # Now let's extract only the part of the data we're interested in...

    los_p = np.empty(len(pepe[1]))#dif_abs
    los_p.fill(p)

    x_filt    = xx[yy<delta]
    y_filt    = yy[yy<delta] 
    los_p_filt= los_p[yy<delta]

    print 'value of coordinate i', x_filt
    print 'absolute difference', y_filt 
    print 'value of coordinate p', los_p_filt
    print '------------------------'
    if (p==1):
        x_filt_all_p=x_filt
        y_filt_all_p=y_filt
        los_p_filt_all_p=los_p_filt
    else:
        x_filt_all_p=np.concatenate((x_filt_all_p,x_filt)) 
        y_filt_all_p=np.concatenate((y_filt_all_p,y_filt))
        los_p_filt_all_p=np.concatenate((los_p_filt_all_p,los_p_filt))

print 'full value of coordinate i: ', x_filt_all_p
print 'full absolute difference', y_filt_all_p 
print 'full value of coordinate p: ', los_p_filt_all_p

#trying to plot the "recurrence plots" together with the envelope.

pp.subplot(211)
pp.plot(arange(data.size),data, color='c',label='Time Signal 2')
pp.legend(fontsize= 'small')
pp.grid(True)
pp.xlabel('Time (s)')
pp.ylabel('Amplitude')  
#pp.xlim([0,3])

pp.subplot(212)
base='test_plot'
pp.title('Recurrence plot delta=')

markerline2, stemlines2, baseline2 = stem(x_filt_all_p*float(1)/float(w[0]), los_p_filt_all_p*float(1)/float(w[0]),'b',linefmt=" ",)
pp.matplotlib.markers.MarkerStyle('.')
setp(markerline2,'markerfacecolor','b',label='points')
pp.legend(fontsize= 'small')
pp.grid(True)
pp.xlabel('Time i [s]')
pp.ylabel('Time p [s]') 

#pp.xlim(0,3)
#pp.ylim(0,3)
pp.show()
#pp.savefig('plots/%s.jpg' %(str(base))
pp.close()  

But I am not sure 100% that my code I working ok. Could someone take a look to my code to give me some advise of how to test it? I don't want to use neither matlab nor mathematica. The idea was to create an independent code in python. Also I have another smaller problem, I could not change the dot size in my plot. Finally I alrady try to use crosscheck with http://recurrence-plot.tk/online/index.php?state= butt I couldn't make it work. Any suggestion on my code or possible crosscheck wold be very welcome. Thanks in advance

回答1:

I understand this question is quite old, but maybe someone will stumble on this in future.

Since you are already using NumPy let me suggest this snippet:

import numpy as np

def rec_plot(s, eps=None, steps=None):
    if eps==None: eps=0.1
    if steps==None: steps=10
    N = s.size
    S = np.repeat(s[None,:], N, axis=0)
    Z = np.floor(np.abs(S-S.T)/eps)
    Z[Z>steps] = steps

    return Z

It initially creates square empty array of (N, N) size. Then it subtract all possible combinations of points via S-S.T, which is implicitly equivalent to having matrix subtraction where one matrix has all rows S and the other all columns S.

Dividing by eps and flooring is for a short hand for asking of how many eps difference is between those points. Then Z[Z>steps] is bounding, so that whenever something is more than steps times eps from the point, then it's max and will be simply plot with same value.

This solution is suboptimal as it first creates two NxN matrices, which for large N is too much. For N>10000 this is definitely not good. Since you are using SciPy we can use its distance library. Below is more optimal implementation:

import numpy as np
from scipy.spatial.distance import pdist, squareform

def rec_plot(s, eps=None, steps=None):
    if eps==None: eps=0.1
    if steps==None: steps=10
    d = pdist(s[:,None])
    d = np.floor(d/eps)
    d[d>steps] = steps
    Z = squareform(d)
    return Z

Examples of usage you can find https://laszukdawid.com/tag/recurrence-plot/ or https://github.com/laszukdawid/recurrence-plot.