What I'm trying to do
I'm trying to write a function to subtract Excel ranges. It should take two input parameters: range A and range B. It should return a range object consisting of cells that are part of range A and are not part of range B (as in set subtraction)
What I've tried
I've seen some examples on the web that use a temporary worksheet to do this (fast, but might introduce some issues with protected workbooks and such) and some other examples that go cell by cell through the first range checking for intersections with the second one (extremely slow).
After some thinking I've come up with this code {1}, which works faster, but still is slow. Subtracting from a range representing the whole worksheet takes from 1 to 5 minutes depending on how complex the second range is.
When I looked over that code trying to find ways to make it faster I saw a possibility for applying the divide-and-conquer paradigm, which I did {2}. But that had made my code slower instead. I'm not much of a CS guy, so I might have done something wrong or this algorithm simply is not the one the divide-and-conquer should be used on, I don't know.
I have also tried rewriting it using mostly recursion, but that took forever to finish or (more often) had thrown Out of Stack Space errors. I didn't save the code.
The only (marginally) successful improvement I've been able to do is adding a flip switch {3} and going first through rows, then (in the next call) through columns instead of going through both in the same call, but the effect was not as good as I've hoped. Now I see that even though we don't go through all rows in the first call, in the second call we still loop through the same amount of rows we would in the first one, only these rows are a little bit shorter :)
I would appreciate any help in improving or rewriting this function, thank you!
The solution, based on the accepted answer by Dick Kusleika
Dick Kusleika, thank you very much for providing your answer! I think I'll use it with some modifications I've made:
- Got rid of the global variable (mrBuild)
- Fixed "some overlap" condition to exclude "no overlap" case
- Added more complex conditions to choose whether to split the range top to bottom or left to right
With these modifications the code runs very fast on the most of common cases. As it's been pointed out, it will still be slow with checkerboard-style huge range which I agree is unavoidable.
I think this code still has a room for improvement and I'll update this post in case I modify it.
Improvement possibilities:
- Heuristics of choosing how to split the range (by columns or by rows)
{0} Solution code
Public Function SubtractRanges(rFirst As Range, rSecond As Range) As Range
'
' Returns a range of cells that are part of rFirst, but not part of rSecond
' (as in set subtraction)
'
' This function handles big input ranges really well!
'
' The reason for having a separate recursive function is
' handling multi-area rFirst range
'
Dim rInter As Range
Dim rReturn As Range
Dim rArea As Range
Set rInter = Intersect(rFirst, rSecond)
Set mrBuild = Nothing
If rInter Is Nothing Then 'no overlap
Set rReturn = rFirst
ElseIf rInter.Address = rFirst.Address Then 'total overlap
Set rReturn = Nothing
Else 'partial overlap
For Each rArea In rFirst.Areas
Set mrBuild = BuildRange(rArea, rInter) 'recursive
Next rArea
Set rReturn = mrBuild
End If
Set SubtractRanges = rReturn
End Function
Private Function BuildRange(rArea As Range, rInter As Range, _
Optional mrBuild As Range = Nothing) As Range
'
' Recursive function for SubtractRanges()
'
' Subtracts rInter from rArea and adds the result to mrBuild
'
Dim rLeft As Range, rRight As Range
Dim rTop As Range, rBottom As Range
Dim rInterSub As Range
Dim GoByColumns As Boolean
Set rInterSub = Intersect(rArea, rInter)
If rInterSub Is Nothing Then 'no overlap
If mrBuild Is Nothing Then
Set mrBuild = rArea
Else
Set mrBuild = Union(mrBuild, rArea)
End If
ElseIf Not rInterSub.Address = rArea.Address Then 'some overlap
If Not rArea.Cells.CountLarge = 1 Then 'just in case there is only one cell for some impossible reason
' Decide whether to go by columns or by rows
' (helps when subtracting whole rows/columns)
If Not rInterSub.Columns.Count = rArea.Columns.Count And _
((Not rInterSub.Cells.CountLarge = 1 And _
(rInterSub.Rows.Count > rInterSub.Columns.Count _
And rArea.Columns.Count > 1) Or (rInterSub.Rows.Count = 1 _
And Not rArea.Columns.Count = 1)) Or _
(rInterSub.Cells.CountLarge = 1 _
And rArea.Columns.Count > rArea.Rows.Count)) Then
GoByColumns = True
Else
GoByColumns = False
End If
If Not GoByColumns Then
Set rTop = rArea.Resize(rArea.Rows.Count \ 2) 'split the range top to bottom
Set rBottom = rArea.Resize(rArea.Rows.Count - rTop.Rows.Count).Offset(rTop.Rows.Count)
Set mrBuild = BuildRange(rTop, rInterSub, mrBuild) 'rerun it
Set mrBuild = BuildRange(rBottom, rInterSub, mrBuild)
Else
Set rLeft = rArea.Resize(, rArea.Columns.Count \ 2) 'split the range left to right
Set rRight = rArea.Resize(, rArea.Columns.Count - rLeft.Columns.Count).Offset(, rLeft.Columns.Count)
Set mrBuild = BuildRange(rLeft, rInterSub, mrBuild) 'rerun it
Set mrBuild = BuildRange(rRight, rInterSub, mrBuild)
End If
End If
End If
Set BuildRange = mrBuild
End Function
Other code mentioned in the question
{1} Initial code (go row by row, column by column)
Function SubtractRanges(RangeA, RangeB) As Range
'
' Returns a range of cells that are part of RangeA, but not part of RangeB
'
' This function handles big RangeA pretty well (took less than a minute
' on my computer with RangeA = ActiveSheet.Cells)
'
Dim CommonArea As Range
Dim Result As Range
Set CommonArea = Intersect(RangeA, RangeB)
If CommonArea Is Nothing Then
Set Result = RangeA
ElseIf CommonArea.Address = RangeA.Address Then
Set Result = Nothing
Else
'a routine to deal with A LOT of cells in RangeA
'go column by column, then row by row
Dim GoodCells As Range
Dim UnworkedCells As Range
For Each Area In RangeA.Areas
For Each Row In Area.Rows
Set RowCommonArea = Intersect(Row, CommonArea)
If Not RowCommonArea Is Nothing Then
If Not RowCommonArea.Address = Row.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Row)
End If
Else
Set GoodCells = AddRanges(GoodCells, Row)
End If
Next Row
For Each Column In Area.Columns
Set ColumnCommonArea = Intersect(Column, CommonArea)
If Not ColumnCommonArea Is Nothing Then
If Not ColumnCommonArea.Address = Column.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Column)
End If
Else
Set GoodCells = AddRanges(GoodCells, Column)
End If
Next Column
Next Area
If Not UnworkedCells Is Nothing Then
For Each Area In UnworkedCells
Set GoodCells = AddRanges(GoodCells, SubtractRanges(Area, CommonArea))
Next Area
End If
Set Result = GoodCells
End If
Set SubtractRanges = Result
End Function
{2} Divide and conquer
Function SubtractRanges(RangeA, RangeB) As Range
'
' Returns a range of cells that are part of RangeA, but not part of RangeB
'
Dim CommonArea As Range
Dim Result As Range
Set CommonArea = Intersect(RangeA, RangeB)
If CommonArea Is Nothing Then
Set Result = RangeA
ElseIf CommonArea.Address = RangeA.Address Then
Set Result = Nothing
Else
'a routine to deal with A LOT of cells in RangeA
'go column by column, then row by row
Dim GoodCells As Range
Dim UnworkedCells As Range
For Each Area In RangeA.Areas
RowsNumber = Area.Rows.Count
If RowsNumber > 1 Then
Set RowsLeft = Range(Area.Rows(1), Area.Rows(RowsNumber / 2))
Set RowsRight = Range(Area.Rows(RowsNumber / 2 + 1), Area.Rows(RowsNumber))
Else
Set RowsLeft = Area
Set RowsRight = CommonArea.Cells(1, 1) 'the next best thing to Nothing - will end its cycle rather fast and won't throw an error with For Each statement
End If
For Each Row In Array(RowsLeft, RowsRight)
Set RowCommonArea = Intersect(Row, CommonArea)
If Not RowCommonArea Is Nothing Then
If Not RowCommonArea.Address = Row.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Row)
End If
Else
Set GoodCells = AddRanges(GoodCells, Row)
End If
Next Row
ColumnsNumber = Area.Columns.Count
If ColumnsNumber > 1 Then
Set ColumnsLeft = Range(Area.Columns(1), Area.Columns(ColumnsNumber / 2))
Set ColumnsRight = Range(Area.Columns(ColumnsNumber / 2 + 1), Area.Columns(ColumnsNumber))
Else
Set ColumnsLeft = Area
Set ColumnsRight = CommonArea.Cells(1, 1)
End If
For Each Column In Array(ColumnsLeft, ColumnsRight)
Set ColumnCommonArea = Intersect(Column, CommonArea)
If Not ColumnCommonArea Is Nothing Then
If Not ColumnCommonArea.Address = Column.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Column)
End If
Else
Set GoodCells = AddRanges(GoodCells, Column)
End If
Next Column
Next Area
If Not UnworkedCells Is Nothing Then
For Each Area In UnworkedCells
Set GoodCells = AddRanges(GoodCells, SubtractRanges(Area, CommonArea))
Next Area
End If
Set Result = GoodCells
End If
Set SubtractRanges = Result
End Function
{3} Initial code + flip switch (row by row OR column by column in turns)
Function SubtractRanges(RangeA, RangeB, Optional Flip As Boolean = False) As Range
'
' Returns a range of cells that are part of RangeA, but not part of RangeB
'
' This function handles big RangeA pretty well (took less than a minute
' on my computer with RangeA = ActiveSheet.Cells)
'
Dim CommonArea As Range
Dim Result As Range
Set CommonArea = Intersect(RangeA, RangeB)
If CommonArea Is Nothing Then
Set Result = RangeA
ElseIf CommonArea.Address = RangeA.Address Then
Set Result = Nothing
Else
'a routine to deal with A LOT of cells in RangeA
'go column by column, then row by row
Dim GoodCells As Range
Dim UnworkedCells As Range
For Each Area In RangeA.Areas
If Flip Then
For Each Row In Area.Rows
Set RowCommonArea = Intersect(Row, CommonArea)
If Not RowCommonArea Is Nothing Then
If Not RowCommonArea.Address = Row.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Row)
End If
Else
Set GoodCells = AddRanges(GoodCells, Row)
End If
Next Row
Else
For Each Column In Area.Columns
Set ColumnCommonArea = Intersect(Column, CommonArea)
If Not ColumnCommonArea Is Nothing Then
If Not ColumnCommonArea.Address = Column.Address Then
Set UnworkedCells = AddRanges(UnworkedCells, Column)
End If
Else
Set GoodCells = AddRanges(GoodCells, Column)
End If
Next Column
End If
Next Area
If Not UnworkedCells Is Nothing Then
For Each Area In UnworkedCells
Set GoodCells = AddRanges(GoodCells, SubtractRanges(Area, CommonArea, Not Flip))
Next Area
End If
Set Result = GoodCells
End If
Set SubtractRanges = Result
End Function
A little helper function mentioned here and there:
Function AddRanges(RangeA, RangeB)
'
' The same as Union built-in but handles empty ranges fine.
'
If Not RangeA Is Nothing And Not RangeB Is Nothing Then
Set AddRanges = Union(RangeA, RangeB)
ElseIf RangeA Is Nothing And RangeB Is Nothing Then
Set AddRanges = Nothing
Else
If RangeA Is Nothing Then
Set AddRanges = RangeB
Else
Set AddRanges = RangeA
End If
End If
End Function