Best fit curve for trend line

2019-06-21 16:01发布

问题:

Problem Constraints

  • Size of the data set, but not the data itself, is known.
  • Data set grows by one data point at a time.
  • Trend line is graphed one data point at a time (using a spline/Bezier curve).

Graphs

The collage below shows data sets with reasonably accurate trend lines:

The graphs are:

  • Upper-left. By hour, with ~24 data points.
  • Upper-right. By day for one year, with ~365 data points.
  • Lower-left. By week for one year, with ~52 data points.
  • Lower-right. By month for one year, with ~12 data points.

User Inputs

The user can select:

  • the type of time series (hourly, daily, monthly, quarterly, annual); and
  • the start and end dates for the time series.

For example, the user could select a daily report for 30 days in June.

Trend Weight

To calculate the window size (i.e., the number of data points to average when calculating the trend line), the following expression is used:

data points / trend weight

Where data points is derived from user inputs and trend weight is 6.4. Even though a trend weight of 6.4 produces good fits, it is rather arbitrary, and might not be appropriate for different user inputs.

Question

How should trend weight be calculated given the constraints of this problem?

回答1:

Based on the looks of the graphs I would say you have too many points for your 12 point graph (it is just a spline of the points given... which is visually pleasing, but actually does more harm than good when trying to understand the trend) and too few points for your 365 point graph. Perhaps try doing something a little exponential like:

(Data points)^1.2/14.1

I do realize this is even more arbitrary than what you already have, but arbitrary isn't the worst thing in the world.

(I got 14.1 by trying to keep the 52 point graph fixed, since that one looks nice, by taking (52^(1.2)/52)*6.4=14.1. You using this technique you could try other powers besides 1.2 to see what you visually get.

Dan



回答2:

I voted this up for the quality of your results and the clarity of your write-up. I wish I could offer an answer that could improve on your already excellent work.

I fear that it might be a matter of trial and error with the trend weight until you see an improved fit.

It could be that you could make this an input from users as well: allow them to fiddle with the value, given realistic constraints, until they get satisfactory values.

I also wondered if the weight would be different for each graph, since the number of points in each is different. Are you trying to get a single weighting that works for all graphs?

Excellent work; a nice question. Well done. I wish I was more helpful. Perhaps someone else will have more wisdom to impart than I do.



回答3:

It might look like the trend lines are accurate in those 4 graphs but its really quite off. (This is best seen in the begging of the lower left one and the beginning of the upper right. I would think that you would want to use no less than half of your points when finding the trend line (though really you should use much more than half). I would suggest a Trend Weight of 2 at a maximum. Though really you ought to stick closer to the 1-1.5 range. Since it is arbitrary i would suggest you give your user an "accuracy of trend line" slider that they can use where the most accurate setting uses a trend weight of 1 and the least accurate uses a weight of #of data points +1. This would use 0 points (amusing you always round down) and, i would assume, though your statistics software might be different, will generate a strait horizontal line.