How to find conditional cumulative sums in an exce

2019-09-22 11:52发布

Let's say I have two columns.

3.5463  11
4.5592  12
1.6993  111
0.92521 112
1.7331  121
2.1407  122
1.4082  1111
2.0698  1112
2.3973  1121
2.4518  1122
1.1719  1211
1.153   1212
0.67139 1221
0.64744 1222
1.3705  11111
0.9557  11112
0.64868 11121

0.7325  11211
0.58874 11212
0.86673 11221
0.17075 11222
0.64026 12111
0.80229 12112

0.43422 12122
1.0405  12211
0.63376 12212
0.56491 12221
0.34626 12222
0.81631 111111
0.91837 111112
0.70013 111121
0.87384 111122
1.1474  111211

0.47411 111221
0.12249 111222
0.56728 112111
0.88169 112112
0.14509 112121

0.68655 112211
0.36274 112212


1.1652  121111
0.99314 121112
0.42024 121121
0.23937 121122




1.0346  122111
0.64642 122112
0.15632 122121
0.41725 122122
0.40793 122211

In the first column, there is a number. With every one of those numbers, in the second column, is an associated ID. Now, there are some blank rows that do not contain any numbers in them.

Define one of these numbers to be a "daughter" of another number if the ID of the first number is the same as the ID of the second, with an extra digit on the end. For example, both IDs 11211 and 11212 are daughters of 1121, because the ID of 1121 has an extra digit, either a 1 or a 2, added onto the end to form the ID of its daughters. Thus, 1121 is the parent of both 11211 and 11212.

Here is what I want the macro to do. It must output a third column which contains, for every row, a cumulative sum of the number of the first column in that row, plus the parent number of that number, and the parent number of the parent number, etc. all the way up until it reachers either 11 or 12. It will begin by simply outputting the numbers in column 1 for 11 and 12 in the third column. Then, in a loop beginning with 111, it will add up the cumulative sum of every row (the number in that row plus the third column output of the parent), only if that row has a number and an id, and only if the parent exists and has an output in column 3. So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406. However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.

I would greatly appreciate it if someone has some time on their hands to code up this VBA macro for me in exchange for an accepted solution.

Thanks

3条回答
疯言疯语
2楼-- · 2019-09-22 12:12

I don't think a macro is needed, just some formulas. First, I put a header on my columns of data, such as "value," and "id." If you then highlight the column labels (i.e., A and B) and sort by B ("id") then A ("value"), you'll group your blank rows. You can then delete those rows. Now you have the data almost ready. When I did this, I converted the id column to text, as opposed to a number value, so if I sort the table by id, the pattern will be, "11, 111, 1111," and so on, instead of, "11, 12, 111, 112, 121." Then, I added columns to separate the separate characters or levels of the ids. This is to help with parents and children. You can use text-to-columns, or a MID formula, but what I did was have 6 more columns to the right. For each id row, each column would either have a "1," a "2," or a blank (null) value. Then I added another column, calling it "level." I used a formula like COUNTA across all my id splitting columns. So, for 11, my level value was 2. 111 would be 3, 11221 would be 5, and so on. This gives me the id level (parent, child, grandchild, etc). Then I added my final column to the right to compute my cumulative sum of the values. In concept I have one big nested IF statement, but in practice, I needed two. My formula says, if the row above me has a lower level number (i.e., it is some kind of parent), add the value of the current row to the value of the above row. Otherwise, keep going up a row till I do get a parent, and add the current row value to that number.

My final formula for all but the first 5 rows of data was (in the 6th row of data): =if(K6

查看更多
Bombasti
3楼-- · 2019-09-22 12:28

So for example, the number in the 3rd column of the row with ID 11222 should be the number in column 1 of that row, plus that of 1122, plus that of 112, plus that of 11. So, 0.17075+2.4518+0.92521+3.5463, or 7.09406.
However, if you try to do this for ID 111221, you will notice that the row where the parent 11122 should be is empty. Thus, the parent does not exist, and no value will be outputted in column 3 for 111221.

As a native worksheet array formula¹ in D1,

=IF(LEN(B1), SUM(SUMIFS(A$1:INDEX(A:A, MATCH(1E+99, A:A)), 
     B$1:INDEX(B:B, MATCH(1E+99, A:A)), LEFT(B1, ROW(INDIRECT("2:"&LEN(B1)))))), TEXT(,))

The above does not compensate for missing parents (null string). It totals everything it can find and uses zero for missing parents.

As a VBA UDF² in E1,

Function conditionalCumulativeSum(nums As Range, _
        ids As Range, sib As Range, _
        Optional nullOnBlank As Boolean = True)
    Dim i As Integer

    'truncate any full column reference to the UsedRange
    Set nums = Intersect(nums, nums.Parent.UsedRange)

    'match the nums and ids ranges
    Set ids = ids.Resize(nums.Rows.Count, nums.Columns.Count)

    For i = Len(sib.Value2) To 2 Step -1
        If nullOnBlank And IsError(Application.Match(--Left(sib, i), ids, 0)) Then
            conditionalCumulativeSum = vbNullString
            Exit For
        End If
        conditionalCumulativeSum = conditionalCumulativeSum + _
            Application.SumIfs(nums, ids, Left(sib, i))
    Next i

    If i = 0 Then conditionalCumulativeSum = vbNullString
End Function

The above defaults to return a null string when it encounters any missing parent through the hereditary chain. This can be turned off by adding FALSE as the optional fourth parameter and then the UDF will behave identically to the native formula.

Results from sample data

    conditionalCumulativeSum


¹ Array formulas need to be finalized with Ctrl+Shift+Enter↵. If entered correctly, Excel with wrap the formula in braces (e.g. { and }). You do not type the braces in yourself. Once entered into the first cell correctly, they can be filled or copied down or right just like any other formula. Try and reduce your full-column references to ranges more closely representing the extents of your actual data. Array formulas chew up calculation cycles logarithmically so it is good practise to narrow the referenced ranges to a minimum. See Guidelines and examples of array formulas for more information.

² A User Defined Function (aka UDF) is placed into a standard module code sheet. Tap Alt+F11 and when the VBE opens, immediately use the pull-down menus to Insert ► Module (Alt+I,M). Paste the function code into the new module code sheet titled something like Book1 - Module1 (Code). Tap Alt+Q to return to your worksheet(s).

查看更多
孤傲高冷的网名
4楼-- · 2019-09-22 12:33

rest of answer is below

=if(K6<K7,L6+C7,if(K5<K7,L5+C7,if(K4<K7,L4+C7,if(K3<K7,L3+C7,if(K2<K7,L2+C7,C7)))))

The values were column C, the original id in column D, the id split columns were E through J, the level column was K, and my formula was in L. This formula can be copied down the table. For the first 4 rows, you just need 1 less IF statement each row you go up. The fifth row of data might take the above formula; it depends how it will deal with the column headers in row one. The formula on the 4 row of data might be:

=if(K4<K5,L4+C5,if(K3<K5,L3+C5,if(K2<K5,L2+C5,if(K1<K5,L1+C5,C5))))

I'm still learning how to format these comments, so I'll try to provide a sample of the layout I have...

     C          D       E   F   G   H   I   J   K    L

1   value       id      1   2   3   4   5   6   lvl cumul_sum  

2   3.546300    11      1   1                   2   3.546300  
3   1.699300    111     1   1   1               3   5.245600  
4   1.408200    1111    1   1   1   1           4   6.653800  
5   1.370500    11111   1   1   1   1   1       5   8.024300  
6   0.816310    111111  1   1   1   1   1   1   6   8.840610  
7   0.918370    111112  1   1   1   1   1   2   6   8.942670  
8   0.955700    11112   1   1   1   1   2       5   7.609500  
查看更多
登录 后发表回答