Python - finding pattern in a plot

2020-02-11 16:54发布

问题:

This graph is generated by the following gnuplot script. The estimated.csv file is found in this link: https://drive.google.com/open?id=0B2Iv8dfU4fTUaGRWMm9jWnBUbzg

# ###### GNU Plot
   set style data lines
   set terminal postscript eps enhanced color "Times" 20

   set output "cubic33_cwd_estimated.eps"

   set title "Estimated signal"

    set style line 99 linetype 1 linecolor rgb "#999999" lw 2
    #set border 1 back ls 11
    set key right top
    set key box linestyle 50
    set key width -2
    set xrange [0:10]
    set key spacing 1.2
    #set nokey

    set grid xtics ytics mytics
    #set size 2
    #set size ratio 0.4

    #show timestamp
    set xlabel "Time [Seconds]"
    set ylabel "Segments"

    set style line 1 lc rgb "#ff0000" lt 1 pi 0 pt 4 lw 4 ps 0

    # Congestion control send window

    plot  "estimated.csv" using ($1):2 with lines title "Estimated";

I wanted to find the pattern of the estimated signal of the previous plot something close to the following plot. My ground truth (actual signal is shown in the following plot)

Here is my initial approach

#!/usr/bin/env python
import sys

import numpy as np
from shapely.geometry import LineString
#-------------------------------------------------------------------------------
def load_data(fname):
    return LineString(np.genfromtxt(fname, delimiter = ','))
#-------------------------------------------------------------------------------
lines = list(map(load_data, sys.argv[1:]))

for g in lines[0].intersection(lines[1]):
    if g.geom_type != 'Point':
        continue
    print('%f,%f' % (g.x, g.y))

Then invoke this python script in my gnuplot directly as in the following:

set terminal pngcairo
set output 'fig.png'

set datafile separator comma
set yr [0:700]
set xr [0:10]

set xtics 0,2,10
set ytics 0,100,700

set grid

set xlabel "Time [seconds]"
set ylabel "Segments"

plot \
    'estimated.csv' w l lc rgb 'dark-blue' t 'Estimated', \
    'actual.csv' w l lc rgb 'green' t 'Actual', \
    '<python filter.py estimated.csv actual.csv' w p lc rgb 'red' ps 0.5 pt 7 t ''

which gives us the following plot. But this does not seem to give me the right pattern as gnuplot is not the best tool for such tasks.

Is there any way where we can find the pattern of the first graph (estimated.csv) by forming the peaks into a plot using python? If we see from the end, the pattern actually seems to be visible. Any help would be appreciated.

回答1:

I think pandas.rolling_max() is the right approach here. We are loading the data into a DataFrame and calculate the rolling maximum over 8500 values. Afterwards the curves look similar. You may test with the parameter a little bit to optimize the result.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()
names = ['actual.csv','estimated.csv']
#-------------------------------------------------------------------------------
def load_data(fname):
    return np.genfromtxt(fname, delimiter = ',')
#-------------------------------------------------------------------------------

data = [load_data(name) for name in names]
actual_data = data[0]
estimated_data = data[1]
df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = pd.rolling_max(df['y'],8500)
plt.figure()
plt.plot(actual_data[:,0],actual_data[:,1], label='actual')
plt.plot(estimated_data[:,0],estimated_data[:,1], label='estimated')
plt.plot(df['x'], df['rolling_max'], label = 'rolling')

plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,500)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

To answer the question from the comments:

Since pd.rolling() is generating defined windows of your data, the first values will be NaN for pd.rolling().max. To replace these NaNs, I suggest to turn around the whole Series and to calculate the windows backwards. Afterwards, we can replace all the NaNs by the values from the backwards calculation. I adjusted the window length for the backwards calculation. Otherwise we get erroneous data.

This code works:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
plt.ion()

df = pd.read_csv('estimated.csv', names=('x','y'))
df['rolling_max'] = df['y'].rolling(8500).max()
df['rolling_max_backwards'] = df['y'][::-1].rolling(850).max()
df.rolling_max.fillna(df.rolling_max_backwards, inplace=True)
plt.figure()
plt.plot(df['x'], df['rolling_max'], label = 'rolling')

plt.legend()
plt.title('Actual vs. Interpolated')
plt.xlim(0,10)
plt.ylim(0,700)
plt.xlabel('Time [Seconds]')
plt.ylabel('Segments')
plt.grid()
plt.show(block=True)

And we get the following result: