I am facing an issue in performing a weighted average logic in excel.
I am looking at 4 fields for different deliverables: Total, Complete, Pending and Weight.
The weighted average for a particular deliverable is calculated as : (Complete/Total) * Weight for example ((5/10) * 0.20) = 10%
For each of the deliverable, I have calculated the % and then added all of the % together.
Deliverable 1 - 10 = 10% + 20% + 5% + .... = 65%
My question is:
if for a particular deliverable, the available field is 'Not Applicable' or '0', then the weighted average for that deliverable will be 0%. There by, the total will be on a lower side. So, is there a better way to represent this? How can I solve this issue?
Ok, so if I am understanding you correctly you are saying the following:
(complete/total) * weight = weighted_average
Where this is one of many weighted averages that will be summed to provide a total_deliverable_value.
You have a case where this algorithm outputs 0% for a weighted_average due to a data issue. This is falsely lowering the total_deliverable_value.
Options:
- add assumed values for missing data (clearly mark them as assumed
rather than recorded fact)
- Have a standard weight for missing data. For example, all missing
data gets 20%
- Go source the missing data
- Change how you are calculating the total_deliverable_value, i.e
instead of:
total_deliverable_value = sum(weighted_average::weighted_average)
use something like this:
total_deliverable_value = sum(weighted_average::weighted_average) / count_of_data_values
where count_of_data_values is the number of non-missing data values you had.
Updated based on comments
You have to change the formula for all calculations. So take a current 'perfect' scenario of 100% such as 5x weighted_averages where weighted_average = 0.2 (20%)
Your original calculation would give this:
a) 0.2+0.2+0.2+0.2+0.2 = 1 (100%)
b) 0.2+NA+0.2+0.2+0.2 = 0.8 (80%)
This would indicate that a is not as good as b (lets not argue whether missing data means that is true), however you wish to show that actually they should be considered almost identical.
The new calculation would give:
a) 1/5 => 0.2 (20%)
b) 0.8/4 => 0.2 (20%)
Therefore both are the same total_deliverable_value. If you want to scale these back up to 100%. Do this:
total_deliverable_value = (sum(weighted_average::weighted_average) / count_of_data_values) * count_of_expect_values
Now we are looking like this:
a) ((0.2+0.2+0.2+0.2+0.2)/5)*5 = 1 (100%)
b) ((0.2+NA+0.2+0.2+0.2)/4)*5 = 1 (100%)
UPDATE: Handling inconsistent max weighted_averages
The above all works because a it is assumed that all the individual weighted_averages are of a consistent max value. Such as a max 0.2 for 5 values:
[0.2, 0.1, 0.18, 0.08, 0.05] ==> 61%
Therefore we can afford to lose a few of the data points and still make a calculation that would enable the total_deliverable_value to be compared:
[0.2, 0.1, NA, NA, 0.05] ==> 58%
However, if the max values are inconsistent, say, the 1st weight_average max was 0.5, we get a break in our result:
[0.4, 0.09, NA, NA, 0.15] ==> 106%
Where 100% should be out Max total_deliverable_value. There is a simple solution to this. Use the weighted_average_max value in the formula. We are no longer missing 40% of our values, we are missing 20%:
[0.4 (max 0.5), 0.09 (max 0.1), NA (max 0.1), NA (max 0.1), 0.15 (max 0.2)] ==> 80%
compared against something similar with all its data points:
[0.4, 0.09, 0.05, 0.03, 0.15] ==> 72%
This the new formula:
total_deliverable_value = (sum(weighted_average::weighted_average) / portion_of_max_values_existing) * total_of_max_values