I have a big dataset with 52166 datapoints and which looks like this:
bc_conc
2010-04-09 10:00:00 609.542000
2010-04-09 11:00:00 663.500000
2010-04-09 12:00:00 524.661667
2010-04-09 13:00:00 228.706667
2010-04-09 14:00:00 279.721667
It is a pandas dataframe and the index is on the datetime. Now I like to plot the data of bc_conc against the time and add a trendline.
I used the following code:
data = data.resample('M', closed='left', label='left').mean()
x1 = data.index
x2 = matplotlib.dates.date2num(data.index.to_pydatetime())
y = data.bc_conc
z = np.polyfit(x2, y, 1)
p = np.poly1d(z)
fig = plt.figure()
ax1 = fig.add_subplot(1, 1, 1)
plt.plot_date(x=x1, y=y, fmt='b-')
plt.plot(x1, p(x2), 'ro')
plt.show()
However, as you can see I resampled my data. I did this because of I don't, the code just gives me a plot of the data without the trendline. If I resample them to days the plot is still without trendline. If I resample them to months, a trendline shows.
It seems as if the code only works for a smaller dataset. Why is this? I was wondering of anyone could explain this to me, because I like to resample my data to days, but not further..
Thanks in advance
This code seems to work fine, whether using hourly or daily resampled data.
Starting with 100,000 data points:
Calculation of trendline with optional resampling:
Yields: