Multi conditional statistics (avg, std dev, z-scor

2019-08-08 18:17发布

I'm looking to calculate statistics for a large data set on Excel and encountering some issues due to data set size.

It seems VBA may be the way to go, as copying AVERAGEIF and STDDEV array functions across data this size is causing long calculation times. Appreciate possible solutions or code that could be used here.

Goals:

  • To calculate statistics (avg, std dev, z-scores) conditional on 2 identifiers (e.g. average of all heights at 01/01/10)
  • Able to handle large data sets (100k+ data points)

Sample Data:

Date    | User ID | Indicator | Data Point
01/01/10| 1       | Height    | 150
01/01/10| 1       | Weight    | 123
01/01/10| 2       | Height    | 146
01/01/10| 2       | Weight    | 123
01/02/10| 1       | Height    | 156
01/02/10| 1       | Weight    | 160
01/02/10| 2       | Height    | 103
01/02/10| 2       | Weight    | 109

Edit:

Expected output would ideally be as z-scores for each data point in a new column. Example: first z-score would be normalized for all heights on 01/01/10 with:

(150 - avg) / stdev

1条回答
该账号已被封号
2楼-- · 2019-08-08 18:40

I don't know what is a z-score, since I'm getting same (+/-) value for all data points. But I'm confident that you will be able to modify the code to get what you want. Data is supposed to be located in sheet "Data", in which there is a command button named Go for the code to execute. Beware! Code is clearing all content from column E onwards.

    Dim lLastRowDB As Long
    Dim dU1 As Object, cU1 As Variant, iU1 As Long, lrU As Long
    Dim dU2 As Object, cU2 As Variant, iU2 As Long
    Dim MyArray() As Variant
    Dim lAV As Double
    Dim lSD As Double
    Dim i As Integer
    Dim j As Integer
    Dim k As Integer

    Private Sub Go_Click()
    Worksheets("Data").Columns("E:EZ").Delete Shift:=xlToLeft 'Clear previous results
    lLastRowDB = Worksheets("Data").Cells(2, 1).End(xlDown).Row 'Assuming your data starts in A2

    'Indexes from Column1 (Dates)
    Set dU1 = CreateObject("Scripting.Dictionary")
    lrU = Cells(Rows.Count, 1).End(xlUp).Row
    cU1 = Range("A2:A" & lrU)
    For iU1 = 1 To UBound(cU1, 1)
      dU1(cU1(iU1, 1)) = 1
    Next iU1

    'Indexes from Column3 (Indicators)
    Set dU2 = CreateObject("Scripting.Dictionary")
    cU2 = Range("C2:C" & lrU)
    j = 0
    For iU2 = 1 To UBound(cU2, 1)
      dU2(cU2(iU2, 1)) = 1
    Next iU2

    'If want to see values in dictionaries, uncomment following six lines

    'For i = 0 To dU1.Count - 1
    '    MsgBox "dU1 has " & dU1.Count & " elements and key#" & i & " is " & dU1.Keys()(i)
    'Next
    'For i = 0 To dU2.Count - 1
    '    MsgBox "dU2 has " & dU2.Count & " elements and key#" & i & " is " & dU2.Keys()(i)
    'Next

    'The following code will look in the complete set of data for each index
    'This accounts for unsorted data, but is resourse-consuming
    'If your data is ordered for shure, just loop the desired rows

    For i = 0 To dU1.Count - 1 'for each Date
        For j = 0 To dU2.Count - 1 'for each Indicator
            ReDim MyArray(1 To 1) As Variant 'reset the array
            For k = 2 To lLastRowDB 'Scan all rows
                If (Worksheets("Data").Cells(k, 1).Value = dU1.keys()(i)) Then
                    If (Worksheets("Data").Cells(k, 3).Value = dU2.keys()(j)) Then
                        MyArray(UBound(MyArray)) = Worksheets("Data").Cells(k, 4).Value 'add found value to array
                        ReDim Preserve MyArray(1 To UBound(MyArray) + 1) As Variant 'now array is 1 element longer
                    End If
                End If
            Next
            'Now MyArray contains desired data.
            'Get average and SD
            lAV = Application.WorksheetFunction.Average(MyArray)
            lSD = Application.WorksheetFunction.StDev(MyArray)
             'Titles
             Worksheets("Data").Cells(1, 5) = "Average"
             Worksheets("Data").Cells(1, 6) = "SD"
             Worksheets("Data").Cells(1, 7) = "z-scores"

            For k = 2 To lLastRowDB
                If (Worksheets("Data").Cells(k, 1).Value = dU1.keys()(i)) Then
                    If (Worksheets("Data").Cells(k, 3).Value = dU2.keys()(j)) Then
                        Worksheets("Data").Cells(k, 5) = lAV
                        Worksheets("Data").Cells(k, 6) = lSD
                        If lSD = 0 Then
                            Worksheets("Data").Cells(k, 7) = "SD is zero. Unable to calculate z-scores"
                        Else
                            Worksheets("Data").Cells(k, 7) = (Worksheets("Data").Cells(k, 4).Value - lAV) / lSD 'z-scores
                        End If
                    End If
                End If
            Next
        Next
    Next

    End Sub
查看更多
登录 后发表回答