I was working on a solution to another question of mine when I stumble across this helpful question and answer. However implementing the answer given by Control Freak over there throws me a Type Mismatch
error as soon as I exit the function and return to my code on the line: Years = ReDimPreserve(Years, i, 3)
. I'm not that skilled of a programmer to figure out what is going wrong here, so can anybody shed some light on this.
Here is my code:
Sub DevideData()
Dim i As Integer
Dim Years() As String
ReDim Years(1, 3)
Years(1, 1) = Cells(2, 1).Value
Years(1, 2) = 2
i = 2
ThisWorkbook.Worksheets("Simple Boundary").Activate
TotalRows = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp).row
For row = 3 To TotalRows
Years = ReDimPreserve(Years, i, 3)
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Years(i - 1, 3) = row - 1
Years(i, 1) = Cells(row, 1).Value
Years(i, 2) = row
i = i + 1
End If
Next row
End Sub
And here is the function as written by Control Freak:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
ReDimPreserve = False
'check if its in array first
If IsArray(aArrayToPreserve) Then
'create new array
ReDim aPreservedArray(nNewFirstUBound, nNewLastUBound)
'get old lBound/uBound
nOldFirstUBound = UBound(aArrayToPreserve, 1)
nOldLastUBound = UBound(aArrayToPreserve, 2)
'loop through first
For nFirst = LBound(aArrayToPreserve, 1) To nNewFirstUBound
For nLast = LBound(aArrayToPreserve, 2) To nNewLastUBound
'if its in range, then append to new array the same way
If nOldFirstUBound >= nFirst And nOldLastUBound >= nLast Then
aPreservedArray(nFirst, nLast) = aArrayToPreserve(nFirst, nLast)
End If
Next
Next
'return the array redimmed
If IsArray(aPreservedArray) Then ReDimPreserve = aPreservedArray
End If
End Function
As you mentioned in the comments, if you are going to continue this way you definitely need to move that redim inside the if statement:
I think this redimming multi-dimensional arrays is overkill for you. I have a few recommendations:
Ranges
I notice that you are using 2 values to represent the start of a range and end of a range (years(i,2) is the start and years(i,3) is the end). Instead why not just use an actual range?
Create a range variable called
startNode
and when you find the end of the range create aRange
object like withRange(startNode,endNode)
.Your code will look something like this:
1D Array
Now you do not need to store 3 values! Just an array of ranges Which you can redim like this:
Note that the only reason that
ReDimPreserve
was created was so that you can redim both dimensions of a 2D array (normally you can only change the second dimension). With a 1D array you can freely redim without any troubles! :)For Each Loop
Lastly I recommend that you use a
for each
loop instead of a regular for loop. It makes your intentions for the loop more explicit which makes your code more readable.Hope this helps! :)
I promised a fuller answer. Sorry it is later than I expected:
As I said in my first comment:
causes
aArrayToPreserve
to have the default type of Variant. This does not match:As you discovered, redefining Years as a Variant, fixes the problems. An alternative approach would be to amend the declaration of
ReDimPreserve
soaArrayToPreserve
is an array of type String. I would not recommend that approach since you are storing both strings and numbers in the array. A Variant array will handle either strings or numbers while a String array can only handle numbers by converting them to strings for storage and back to numbers for processing.I tried your macro with different quantities of data and different amendments and timed the runs:
As I said in my second comment,
ReDim Preserve
is slow for both the inbuilt method and the VBA routine you found. For every call it must:ReDim Preserve
is a very useful method but it must be used with extreme care. Sometimes I find that sizing an array to the maximum at the beginning and using ReDim Preserve to cut the array down to the used size at the end is a better technique. The best techniques shown below determine the number of entries required before sizing the array.At the bottom of your routine, I added:
This resulted in the following being output to the Immediate Window:
Since you have called the array
Years
, I doubt my string values are anything like yours. This does not matter. What matters, is that I doubt this output was exactly what you wanted.If you write:
The lower bounds are set to the value specified by the
Option Base
statement or zero if there is noOption Base
statement. You have lower bounds for both dimensions of zero which you do not use. This is the reason for the “|||” at the top. There is another “|||” at the end which means you are creating a final row which you are not using. The final used row does not have an end row which I assume in a mistake.When I can divide a routine into steps, I always validate the result of one step before advancing to the next. That way, I know any problems are within the current step and not the result of an error in an earlier step. I use
Debug.Print
to output to the Immediate Window most of the time. Only if I want to output a lot of diagnostic information will I write to a text file. Either way, blocks of code like mine are a significant aid to rapid debugging of a macro.I would never write
ReDim Years(1, 3)
. I always specify the lower bound so as to be absolutely clear. VBA is the only language I know where you can specify any value for the lower bound (providing it is less than the upper bound) so I will specify non-standard values if is helpful for a particular problem. In this case, I see not advantage to a lower bound other than one so that is what I have used.With two dimensions arrays it is conventional to have columns as the first dimension and rows as the second. One exception is for arrays read from or to be written to a worksheet for which the dimensions are the other way round. You have rows as the first dimension. If you have used the conventional sequence you could have used the
ReDim Preserve
method, thereby avoiding theRedimPreserve
function and the problem of non-matching types.Technique 1
I expected this to be the fastest technique. Experts advise us to avoid “re-inventing the wheel”. That is, if Excel has a routine that will do what you want, don’t code an alternative in VBA. However, I have found a number of examples where this is not true and I discovered this technique was one of them.
The obvious technique here is to use
Filter
, then create a range of the visible rows usingSpecialCells
and finally process each row in this range. I have used this technique very successfully to meet other requirements but not here.I did not know the VBA to select unique rows so started the macro recorder and filtered my test data from the keyboard to get:
My past uses of
Filter
have all converted to AutoFilter which I have found to give acceptable performance. This converted toAdvancedFilter
which took 20 seconds both from the keyboard and from VBA. I do not know why it is so slow.The second problem was that:
was rejected as “too complicated”.
Not being able to get the visible rows as a range means the benefits of
Filter
are not really available. I have counted the visible rows to simulate havingRngUnique.Rows.Count
. This shows the technique which has always worked withAutoFilter
. IfAdvancedFilter
had reported the unique rows in an accepted time I might have investigated this problem but under the circumstances it does not seem worth the effort.The macro demonstrating this technique is:
The output to the Immediate Window is:
As you can see the last row is correct. A duration of 20 seconds is better than the 8 minutes of your technique but I am sure we can do better.
Technique 2
The next macro is similar to the last one but it counts the unique rows rather than use AdvancedFilter to hide the non-unique rows. This macro has a duration of 1.5 seconds with 35,000 rows. This demonstrates that counting how many rows are required for an array in a first pass of the data is a viable approach. The diagnostic output from this macro is the same as above.
Technique 3
The next macro is only slightly changed from the last.
Firstly, I have replaced the literals used to identify the column numbers in worksheets and arrays with constants such as:
Under my naming convention
ColYrEnd
= Column of Year array holding range End hence:This makes no difference to the compiled code but makes the source code easier to understand because you do not have to remember what columns 1, 2 and 3 hold. More importantly, if you ever have to rearrange the columns, updating the constants is the only change required. If you ever have to search through a long macro replacing every use of 2 as a column number (while ignoring any other use of 2) by 5, you will know why this is important.
Secondly, I have used:
to import column 1 to an array. The code that read the values from the worksheet now reads them from this array. Array access is much faster than worksheet access so this reduces the runtime from 1.5 seconds to .07 seconds.
The revised code is:
Other techniques
I considered introducing other techniques but I decided they were not useful for this requirement. Also, this answer is already long enough. I have provided much for you to think about and more would just be overload. As stated above I have reduced the run time for 35,000 rows from 8 minutes to 20 seconds to 1.5 seconds to .07 seconds.
Work slowly through my macros. I have hope I have provided adequate explanation of what each is doing. Once you know a statement exists, it is generally easy to look it up so there is not too much explanation of the statements. Come back with questions as necessary.
As stated earlier in comments, ReDim Preserve is an expensive call when working with large datasets and is generally avoided. Here is some commented code that should perform as desired. Tested on a dataset with 200,000 rows, it took less than 5 seconds to complete. Tested on a dataset with 1000 rows, it took less that 0.1 seconds to complete.
The code uses a Collection to get the unique values out of column A, and then builds the array based on those unique values and outputs the results to another sheet. In your original code, there was nowhere that the resulting array was output, so I just made something up and you'll need to adjust the output section as needed.