I have two variables, lets call them x
and y
, which when plotted are the scattered blue points in the graph. I have fitted them using curve_fit from Scipy.
I want to generate (lets say 500000) "smoothed" random numbers replicating the distribution followed by x
and y
.
By "smoothed" I mean, I don't want randoms that exactly replicate my data (x
and y
) like in the figure below, with the red diamonds being my data distribution and the histogram being my generated randoms.
(even the fluctuations of the data are replicated here!!!!). I want a "smoothed" histogram.
What I have tried so far is to fit the points x
and y
using curve_fit
from scipy. So now I know what the data distribution is. Now I need to create random numbers that follow the above fit/distribution.
P.S I have also tried creating uniform randoms from 0 to 1 and trying to get the points below the fitted curve, but I don't know how!
I propose that you take your data distribution fit and then add some random "noise" to it, this should produce some data that still follows your distribution but is randomised for whatever purpose you require.
Below is some code which takes a data distribution fit (in the function curve
) and then randomised the data that is retrieved from it using the numpy.random
module.
import numpy as np
import matplotlib.pyplot as plt
from random import random
# I don't have your data but let's assume that this function
# replicates the data distribution you want to work with.
def curve(x):
return 2. * x + 5.
N = 100
x = np.linspace(0,1,100)
y_fit = curve(x)
# margin controls how "noisy" you want your fit to be.
margin = 0.5
noise = margin*(np.random.random(N)-0.5)
y_ran = y_fit + noise
plt.plot(x, y_fit) # Plot the fitted distribution.
plt.plot(x, y_ran, 'rx') # Plot the noisy data.
plt.show()
Note that this only creates 100 randomised results, you could modify the code to make as many as you need if you wished.
What I think you might be able to do is to rescale your fit to the y-range [0,1], and then start the following loop:
- generate a random x value
- for this x value, generate a y value in the range [0,1]
- if this y value is below the value of the rescaled fit at that x value, accept it, otherwise discard the x-y pair and go to the next iteration of the loop
this should give you a bunch of random numbers that follow your smoothed distribution