I'm looking to calculate statistics for a large data set on Excel and encountering some issues due to data set size.
It seems VBA may be the way to go, as copying AVERAGEIF and STDDEV array functions across data this size is causing long calculation times. Appreciate possible solutions or code that could be used here.
Goals:
- To calculate statistics (avg, std dev, z-scores) conditional on 2 identifiers (e.g. average of all heights at 01/01/10)
- Able to handle large data sets (100k+ data points)
Sample Data:
Date | User ID | Indicator | Data Point
01/01/10| 1 | Height | 150
01/01/10| 1 | Weight | 123
01/01/10| 2 | Height | 146
01/01/10| 2 | Weight | 123
01/02/10| 1 | Height | 156
01/02/10| 1 | Weight | 160
01/02/10| 2 | Height | 103
01/02/10| 2 | Weight | 109
Edit:
Expected output would ideally be as z-scores for each data point in a new column. Example: first z-score would be normalized for all heights on 01/01/10 with:
(150 - avg) / stdev
I don't know what is a z-score, since I'm getting same (+/-) value for all data points. But I'm confident that you will be able to modify the code to get what you want. Data is supposed to be located in sheet "Data", in which there is a command button named Go for the code to execute. Beware! Code is clearing all content from column E onwards.