I'm using the GROWTH (or LINEST or TREND or LOGEST, all make the same trouble) function in Excel 2003. But there is a problem that if some data is missing, the function refuses to give result:
You can download the file here.
Is there any workaround? Looking for easy and elegant solution.
I don't want the obvious workaround of getting rid of the missing value - that would mean to delete the column and that would also damage the graph, and it would make problems in my other tables where I have more rows and missing data in different columns. Other obvious workaround is to use one data for regression and the other for graph, but again, this is annoying and only makes mess in the sheet!!
Is there any way to tell excell - this value is NA?
Another idea would be to skip the missing value(s) in the expression. Is it possible to address a set of cells that is not continuous? Like instead of =GROWTH($B2:$AH2; $B1:$AH1; B1)
as in my example, use something like:
=GROWTH({$B2:$I2,$K2:$AH2}; {$B1:$I1,$K1:$AH1}; B1)
I'd of course like to avoid writing my own expressions. I need to explain this to my colleagues how to do all this and it would much more complicated. I want an easy and elegant solution.
I know that this is old...but in case you or someone else might still be looking for an answer, have you tried using the FORECAST
function? It will calculate the trend with missing values (as long as there aren't any "#N/A" cells).
In my case, I needed to create a gapless graph with missing values, but I also needed to calculate a trend from the data. So first I'd link the graph to a dataset that placed an #N/A for every missing value:
e.g., IF(ISBLANK(B2),NA(),B2)
But then I'd calculate the forecast numbers with the original data:
=FORECAST(B1,$B2:$AH2,$B1:$AH1)
Unless I'm missing something, that should take care of it. You basically end up with two rows of identical numbers, but one has blanks for the FORECAST
calculation, and the other replaces each blank with an NA()
for the graph.
This turns out to be "trivial" if you know the trick.
To use LINEST with missing values, you need to create the X-matrix (r rows by c columns) and Y-vector (r rows by one column) as usual. You also need to create an additional column in the X-matrix to serve as an indicator variable. Place this column immediately to the left of the X matrix. So, if the X matrix start in column B, put the additional column in column A. Set this indicator value to zero for every row you want to omit. Set this indicator value to one for every row you want to include. Multiply every other column in the X-matrix and the Y-vector by this indicator variable. Put this new augmented X matrix and new Y vector elsewhere in the spreadsheet. You should now have a new X-matrix (r rows by c+1 columns) and Y-vector with rows of zeros straight across for every row to be omitted. THIS IS CRITICAL!
Now use the LINEST function as normal , specifying the entire Y-vector and the expanded r×(c+1) X-matrix (with the indicator column included as the first two function parameters, "False" (that is, zero) as the third parameter and either "TRUE" (that is, one) or "FALSE" (that is, zero) as the fourth function parameter. The correct parameter estimates appear in the first row of the LINEST output. All the other LINEST output values are wrong except the value in the fifth row and second column (residual sum of squares) if you specified "TRUE" to get statistics.
If you specified the fourth function parameter as "TRUE" to get statistics, you need to correct the output for the incorrect values. The values in rows 2,3 and 4 of the expanded output are incorrect; the value in row 5 column 1 is also incorrect. You need to fix them.
Make a copy of the first row of the LINEST output elsewhere in the worksheet. If you specified "TRUE" for statistics, then reserve four blank rows below this copy. Copy row 5 column 2 value from the original LINEST output to row 5 column 2 of the new output space
Step One: Compute the correct number of degrees of freedom to replace the value in row 4 column 2 of the LINEST output. Find the number of parameters in the model; this is c+1. You could use the COUNT function to count the number of columns in the expanded X-matrix. Next add up all the values in the indicator column of the X-matrix. Suppose four rows have all zero values. Use the SUM function: this gives r – 4 = number of rows with "1" in the indicator column. The correct degrees of freedom is the difference: SUM(indicator column) – COUNT(augmented X-matrix columns). This is the value that should be placed in row 4 column 2 of the new output space.
Step Two: Fix row 2 and row 3 column 2. Divide the wrong d.f. (row 4 column 2) in the original LINEST output by the correct d.f. (row 4 column 2) in the new output space. Take the square root of this quotient. Multiple the values in row 2 and in row 3 column 2 in the original LINEST output space by this correction factor to get the correct standard errors of the parameters and the correct standard error of Y.
Step Three: Correct the sum of squares of regression. The original LINEST output has the value for the sum of squares due to regression uncorrected for the mean in row 5 column 1 of the output; we want the sum of squares of regression corrected for the mean. We need to calculate the correction for the mean. This is the sum of the Y vector values squared, divided by the sum of the indicator column values. Subtract this from the value in row 5 column 1 of the original LINEST output and put the answer in row 5 column 1 of the new output space.
Step Four: Correct the F-ratio in row 4 column 1. We need to compute the mean squares due to regression and due to the residuals. The mean square due to regression (numerator in the F-ratio) is the value in row 5 column 1 of the new output space divided by c, the number of columns in the original X matrix before augmentation. The mean square due to the residuals (denominator in the F-ratio) is row 5 column 2 of the new out put space divided by row 4 column 2 of the new output space. Compute the F-ratio from these two intermediate values and place the result in row 4 column 1 of the new output space.
Step five: Correct the R-square value in row 3 column 1. This is 1 - (row 5 column 2 divided by the sum of row 5 column 1 and row 5 column 2), using the values from the new output space.
Check your work: Make a copy of the augmented X matrix and Y vector elsewhere in the spreadsheet. Replace any entries with zero for those rows that have zero in the indicator variable. Delete all the cells in rows with zeros, shifting cells up. You should now have an X-matrix and Y vector with fewer rows, but no missing values. Delete the indicator column. Now use LINEST to fir a regression model to this reduced set of data, but this time set the third parameter to TRUE (include a constant). These results ought to be identical to the results you have in the new output space.
My solution consists of 2 parts:
- To avoid gap on the chart put in cells where data is missing
=NA()
function - it produces #N/A
error, and such error types are handled by charts exactly as you want: line is interpolated between available points that surround a missing one. Read more here: http://www.j-walk.com/ss/excel/usertips/tip024.htm
- If you need a trendline - why don't you use built-in routine for this? I added an exponential one to your data, and it fits 100% to your
GROWTH
calculated values. And it handles correctly #N/A
's as well. To make sure that trendline fits your data - simply replace #N/A
temporarily with average of two neighbor cells (297 for your sample) - it will calculates GROWTH
series then, and you'll see it's exactly fits added trendline. Read here about trendlines: http://office.microsoft.com/en-001/excel-help/add-a-trendline-to-a-chart-HP005198462.aspx and http://www.computergaga.com/excel/2003/intermediate/charts/add_a_trendline.html
Your file with the applied solutions is shared: https://www.dropbox.com/s/j7htrk9ih2jtcq6/TrendlineNA.xls
Hope that was helpful!