I'd like to smooth a scatter plot shown below (the points are very dense), and the data is here.
There is large noise in the middle of the curve, and I'd like to smooth the curve, also the y value should monotonically increase.
Since there are lots of curves like this, it is kind of hard to know where the noise is in the curve.
I tried scipy.signal.savgol_filter
, but it didn't work.
The code I used is:
from scipy.signal import savgol_filter
from scipy import interpolate
import numpy as np
import matplotlib.pyplot as plt
s = np.loadtxt('data.csv', delimiter=',')
x = s[:, 0]
y = s[:, 1]
yhat = savgol_filter(y, 551, 3)
plt.plot(x, y, 'r')
plt.plot(x, yhat, 'b')
plt.show()
Suggestions are really appreciated. Thanks!
-------------------update-------------------------
Following Colin's method, I get the results I want. Here is the code:
from scipy.signal import savgol_filter
from scipy import interpolate
import numpy as np
import matplotlib.pyplot as plt
s = np.loadtxt('data.csv', delimiter=',')
x = s[:, 0]
y = s[:, 1]
yhat = savgol_filter(y, 551, 3)
tolerance = 0.2
increased_span = 150
filter_size = 11
first_pass = medfilt(y,filter_size)
diff = (y-first_pass)**2
first = np.argmax(diff>tolerance) - increased_span
last = len(y) - np.argmax(diff[::-1]>tolerance) + increased_span
print (first, last)
#interpolate between increased span
yhat[first:last] = np.interp(x[first:last], [x[first], x[last]], [y[first], y[last]])
f = interpolate.interp1d(x, yhat, kind='slinear')
x_inter = np.linspace(x[0], x[-1], 1000)
y_inter = f(x_inter)
y_inter = savgol_filter(y_inter, 41, 3)
plt.plot(x, y, 'r')
plt.plot(x, yhat, 'b')
plt.show()
If we firstly isolate the trouble area there are many ways to remove it. Here is an example: