可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I'm having a fairly large dataset where I need to combine multiple entries into a single value. My dataset contains data on the combination of two datasets, each using their own ID's and keys.
I thought of using a Sumproduct()
function like this:
=SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2);--('Raw data'!O:O={20;21;22;23;40});'Raw data'!S:S)
With Landgebruik!A2
holding an ID for the first dataset, which I need to aggregate the second dataset to.
'Raw data'!O:O
contains the ID's from the second dataset. In the case above I need to sum the area (in 'Raw data'!S:S
) when the value of the second ID is any of these values: {20;21;22;23;40}
. (OR logic) The column only contains integer values.
Is there any other way of fixing this then duplicating --('Raw data'!O:O=20)
for all values in the array?
EDIT:
I went with the work-around for now, which was: =SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2);--('Raw data'!O:O=20)+('Raw data'!O:O=20)+('Raw data'!O:O=21)+('Raw data'!O:O=22)+('Raw data'!O:O=23)+('Raw data'!O:O=40);'Raw data'!S:S)
. But I feel that there should be a more elegant way of doing this.
回答1:
You could make a small change to your current formula; change the ;
to *
(--
are also unneeded in that particular case):
=SUMPRODUCT(('Raw data'!C:C=Landgebruik!A2)*('Raw data'!O:O={20;21;22;23;40})*'Raw data'!S:S)
And that should work.
When you feed separate parameters to SUMPRODUCT
, each parameter has to be of the same size. But when you multiply them like this, it forces evaluation and the arrays expand.
For example, if you take two arrays, 5x1 and 1x5, you get a 5x5 resulting array:
回答2:
You can use text search for this:
--NOT(ISERROR(FIND('Raw data'!O:O,"2021222340")))
But you have to be careful that a shorter ID is not found incorrectly in a longer ID, e.g. if you want to search among the IDs { 123, 456, 789 } then 12 is not considered to be among the IDs. So a simple text search like the above would not work. You need a delimiter character to break up the string of IDs. Usually I use the pipe character for this purpose, since I cannot remember any case when it occurred in the original text of an Excel file, and because it makes the formula human-readable:
--NOT(ISERROR(FIND("|"&'Raw data'!O:O&"|","|20|21|22|23|40|")))
Examples:
'Raw data'!O:O is 20 => |21| is found in |20|21|22|23|40|
'Raw data'!O:O is 2 => |2| is not found in |20|21|22|23|40|
(If your IDs may include the pipe character, then you can use CHR(1), a long forgotten ASCII code for SOH meaning start of header; of course, it's less readable.)
The whole formula:
=SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2),--NOT(ISERROR(FIND("|"&'Raw data'!O:O&"|","|20|21|22|23|40|"))),'Raw data'!S:S)
(Sorry, my Excel uses , instead of ;)
回答3:
Even though this has been done hundreds of times before, hey maybe microsoft switched up the formulas or something.
I am partial to the method Jerry and Me suggested as they are simple as hell and concise, but you pay a heavy performance cost.
Tom's formula looks ugly to me but was fastest by far, about 4x faster than my initial example. We were able to incorporate the {}s with Tom's formula, but to get it to work we had to wrap the sumifs function with a sum function. This slowed down the formula considerably but made it prettier.
z32a7ul had a great solution too. I really like the use of -- and learned how to use |s to search for a text and only that text. At first glance I thought that it would not work on a number such as 2323 but it does.
Mock up example was as follows:
A1:A5000 was filled with LandgeBruik,
B1:B5000 was filled with 40's
C1:5000 was filled with 1's.
The results:
=SUMPRODUCT((A1:A5000="LandgeBruik")*(B1:B5000={20,21,22,23,40})*C1:C5000)
19.186031 seconds elapsed
59,818,073 ticks
{=SUM(IF(A1:A5000="Landgebruik",1,0)*IF(B1:B5000={20,21,22,23,40},1,0)*C1:C5000)}
26.124411 seconds elapsed
81,450,506 ticks
{=SUM((A1:A5000=""Landgebruik"")*(B1:B5000={20,21,22,23,40})*C1:C5000)}
21.111835 seconds elapsed
65,822,330 ticks
"=SUMIFS(C1:C5000,B1:B5000,"">=20"",B1:B5000,""<=23"",A1:A5000,""=Landgebruik"")+SUMIFS(C1:C5000,B1:B5000,""=40"",A1:A5000,""=Landgebruik"")"
6.732804 seconds elapsed
20,991,490 ticks
"=SUM(SUMIFS(C1:C5000,A1:A5000,"Landgebruik",B1:B5000,{21,22,23,24,40}))"
16.954528 seconds elapsed
52,860,709 ticks
"=SUMPRODUCT(--(A1:A5000=""Landgebruik""),--NOT(ISERROR(FIND(""|""&B1:B5000&""|"",""|20|21|22|23|40|""))),C1:C5000)"
11.822379 seconds elapsed
36,859,729 ticks
Handy class TimerWin64 used to time these:
Option Explicit
Private Declare PtrSafe Function QueryPerformanceCounter Lib "kernel32" (lpPerformanceCount As LongInteger) As Long
Private Declare PtrSafe Function QueryPerformanceFrequency Lib "kernel32" (lpFrequency As LongInteger) As Long
Private Type LongInteger
First32Bits As Long
Second32Bits As Long
End Type
Private Type TimerAttributes
CounterInitial As Double
CounterNow As Double
PerformanceFrequency As Double
End Type
Private Const MaxValue_32Bits = 4294967296#
Private this As TimerAttributes
Private Sub Class_Initialize()
PerformanceFrequencyLet
End Sub
Private Sub PerformanceFrequencyLet()
Dim TempFrequency As LongInteger
QueryPerformanceFrequency TempFrequency
this.PerformanceFrequency = ParseLongInteger(TempFrequency)
End Sub
Public Sub CounterInitialLet()
Dim TempCounterIntital As LongInteger
QueryPerformanceCounter TempCounterIntital
this.CounterInitial = ParseLongInteger(TempCounterIntital)
End Sub
Public Sub PrintTimeElapsed()
CounterNowLet
If CounterInitalIsSet = True Then
Dim TimeElapsed As Double
TimeElapsed = (this.CounterNow - this.CounterInitial) / this.PerformanceFrequency
Debug.Print Format(TimeElapsed, "0.000000"); " seconds elapsed "
Dim TicksElapsed As Double
TicksElapsed = (this.CounterNow - this.CounterInitial)
Debug.Print Format(TicksElapsed, "#,##0"); " ticks"
End If
End Sub
Private Function CounterNowLet()
Dim TempTimeNow As LongInteger
QueryPerformanceCounter TempTimeNow
this.CounterNow = ParseLongInteger(TempTimeNow)
End Function
Private Function CounterInitalIsSet() As Boolean
If this.CounterInitial = 0 Then
MsgBox "Counter Initial Not Set"
CounterInitalIsSet = False
Else
CounterInitalIsSet = True
End If
End Function
Private Function ParseLongInteger(ByRef LongInteger As LongInteger) As Double
Dim First32Bits As Double
First32Bits = LongInteger.First32Bits
Dim Second32Bits As Double
Second32Bits = LongInteger.Second32Bits
If First32Bits < 0 Then First32Bits = First32Bits + MaxValue_32Bits
If Second32Bits < 0 Then Second32Bits = First32Bits + MaxValue_32Bits
ParseLongInteger = First32Bits + (MaxValue_32Bits * Second32Bits)
End Function
Here is the code used to run the test:
Option Explicit
Sub testFunctions()
With Application
.Calculation = xlCalculationManual
.ScreenUpdating = False
.DisplayStatusBar = False
.EnableEvents = False
End With
Dim ws As Worksheet
Set ws = Sheets("Test")
Dim Index As Long
Dim Timer As TimerWin64
Set Timer = New TimerWin64
Timer.CounterInitialLet
For Index = 1 To 5000
Dim rngAddress As String
rngAddress = "D" & Index
ws.Range(rngAddress).Formula = "=SUMPRODUCT(--(A1:A5000=""Landgebruik""),--NOT(ISERROR(FIND(""|""&B1:B5000&""|"",""|20|21|22|23|40|""))),C1:C5000)"
Next Index
Timer.PrintTimeElapsed
With Application
.Calculation = xlCalculationAutomatic
.ScreenUpdating = True
.DisplayStatusBar = True
.EnableEvents = True
End With
End Sub
回答4:
You could split it into two SUMIFS as mentioned in the comment. If all values are integers, then comparing 'Raw data'!O:O to 20,21,22 and 23 is the same as testing it for >=20 and <=23. The value 40 has to be done separately.
=SUMIFS('Raw Data'!S:S,'Raw Data'!C:C,Landgebruik!A2,'Raw Data'!O:O,">="&20,'Raw Data'!O:O,"<="&23)
+SUMIFS('Raw Data'!S:S,'Raw Data'!C:C,Landgebruik!A2,'Raw Data'!O:O,40)
in my locale
or
=SUMIFS('Raw Data'!S:S;'Raw Data'!C:C;Landgebruik!A2;'Raw Data'!O:O;">="&20;'Raw Data'!O:O;"<="&23)
+SUMIFS('Raw Data'!S:S;'Raw Data'!C:C;Landgebruik!A2;'Raw Data'!O:O;40)
in your locale.
This only works when several of the criteria are consecutive integers.
Speed considerations
SUMIFS is thought to be about five times faster than sumproduct so may be the preferred option for large datasets as demonstrated here
You could argue that the more general suggestion of (effectively) five SUMIFS within a SUM from @ BrakNicku should be about as fast as one SUMPRODUCT, but the SUM(SUMIFS) would probably still win because formulas like SUMIFS handle full-column references more efficiently than array formulas.
回答5:
I want to give a shot to this question, after asking OP for some clarifications, because English is not my main language and I think I have misunderstood something.
So, what I did to simulate situation, made a new workbook with 2 sheets.
One sheet is named Landgebruik
and got a value in A2
and I did this:
The second sheet is named Raw data
. I hide some columns to use only columns C, O and S. In column S I input just values equal to 1. In column O I did randomly values equal to {20,21,22,23,40}
and in Column C I did randomly values which were A or B. And it looks like this (please, note I hide some columns):
And question would like to sum values in column S but only If column O is equal to 20 or 21 or 22 or 23 o 40 and column C is equal to Landgebruik!A2
(in my test, value in there is letter A
)
We can use an array formula to filter the data in column S and then, once filtered, sum values that meet requirements. In my test, the correct result would be 8, beause only 8 values in column S meet requirements of column C and O. In the image, the right rows are highlighted in yellow.
OP already did this, but wants to know if there is a shorter/elegant formula.
Shortest formula I found is like this:
=SUM(IF($O$2:$O$28={20;21;22;23;40};IF($C$2:$C$28=Landgebruik!$A$2;$S$2:$S$28)))
This is an array formula, so it must be inserted pressing
CTRL+SHIFT+ENTER or it won't work!
HOW IT WORKS:
First IF
takes all values in column S and ignores all where equivalent in Column O are not 20 or 21 or 22 or 23 or 40. Second IF
takes that new array, and ignores all values where equivalent in column C are not equal to Landgebruik!$A$2
. Final array is sumed up by the function SUM
I've tried to explain the best I can. I hope you can adapt this to your needs.
回答6:
If you are interested in performance (calculation speed) and are not afraid of matrix calculation, you can use MMULT:
=SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2),MMULT(--('Raw data'!O:O={20,21,22,23,24}),TRANSPOSE({1,1,1,1,1})),'Raw data'!S:S)
Explanation:
First, you create a 1048576×5 matrix, where the value in the i-th row and j-th column is 1 if the ID in 'Raw data'!O:O's i-th line is the same as the j-th value in the enumeration {20,21,22,23,24}, 0 otherwise.
Second, you multiply this by a vector of 1s (5 1s because {20,21,22,23,24} contains five elements), which means that you accept all the five values.
Third, from the above you get a vector where the i-th element is 1 if the ID is among the accepted values, 0 otherwise, and you put this vector next to the others in your SUMPRODUCT.
(Sorry, my Excel uses ',' instead of ';'. If you want to shorten the formula, you may write {1;1;1;1;1} instead of TRANSPOSE({1,1,1,1,1}). But you have to find out what your Excel uses instead of ';' to separate rows, most probably '.'.)
Note: It may improve the speed of the calculation if you refer to the range which actualy contain values, not the whole column, e.g. 'Raw data'!C1:C123 instead of 'Raw data'!C:C.
If you insert new rows with Shift+Space Ctrl++ above the last row already included, then the references in your formulas will be updated automatically. Alternatively, you may use Names with special formulas that grow the Range referred to by determining the last non-empty cell.
Update
I made some measurements to compare the efficiency of these approaches. I used random data of 10000 rows and I recalculated each formula 1000 times. You can see the elapsed time in the second column.
I commented out the other formulas while I ran this VBA code to measure the time:
Public Sub MeasureCalculationTime()
Dim datStart As Date: datStart = Now
Dim i As Long: For i = 1 To 1000
Application.Calculate
Next i
Dim datFinish As Date: datFinish = Now
Dim dblSeconds As Double: dblSeconds = (datFinish - datStart) * 24 * 60 * 60
Debug.Print "Calculation finished at " & datFinish; " took " & dblSeconds & " seconds"
End Sub
In this scenario, MMULT was not the fastest.
However, I would like to point out that it is the most flexible because
You may use it with switches: You refer to a cell range instead of the {1,1,1,1,1}, and you will be able to include / exclude IDs in the selection very quickly. Like you put into A1:A5 {20,21,22,23,24} and next to it, into B1:B5 {1,1,1,1,1}. If you want to exclude 21, then you rewrite B2 to 0, if you want to include it, you write it back to 1.
You may use more complicated criteria, where you have to compare multiple levels. Like:
=SUMPRODUCT(MMULT(--(CarId=CarOwner),--(CarOwner=ListOfJobs),--(ListOfJobs=JobsByDepartment),--(DepartmentIncludedInSelection=1)),FuelConsumption)
Note: The above line is just pseudocode, MMULT has only two parameters.
回答7:
This could work:
={SUMPRODUCT(--('Raw data'!C:C=Landgebruik!A2);--IFERROR(MATCH('Raw data'!O:O;{20;21;22;23;40};0)>0;0);'Raw data'!S:S)}
This needs to be entered as an array formula.