I have a data set in excel with measurements taken every 15 minutes (one measurement one row), where I would like to keep only the ones taken on a full hour and delete the rest. In other words: I would like to keep the first row, delete the next three, keep the 5th row, delete the next three and so on. I would like to use VBA but I'm completely new to this. I've found this macro here, which deletes every four rows
Sub remove_rows()
Dim x As Long
Application.ScreenUpdating = False
For x = 100 To 1 Step -5
Range(x & ":" & x - 3).EntireRow.Delete
Next x
Application.ScreenUpdating = True
End Sub
(Source: Delete every four rows in excel) How do I change it to delete only every three rows? I have to do the same with a dataset with measurements taken every 5 minutes (keep the 1 row, delete the next eleven, keep the 12th and so on). Is this macro good for this dataset as well? And lastly - is using VBA the best solution for this problem or is there another method which is better? The datasets are fairly big (100k+ rows).
Powerquery (2016 in the data tab > Get and Transform, 2013 free add-in from microsoft then powerquery tab) is absolutely optimized for this sort of operation over the number of rows you mention. It takes < 1 minute to write the query.
1) Select row in data, then data > from table
2) Indicate whether you table has headers
3) Query editor screen pops up
4) Select Home > Remove rows > Remove alternate row
5) Specify pattern. For example, keep 1 remove 3 as in your example:
6) Hit Ok. Observe new pattern fits requirement
7) Home > Close and Load >
8) Specify where to close and load to e.g. new sheet
You can load to same sheet or new sheet. Bingo you have your new data set. Anytime you add rows to your original dataset you can simply refresh this query and the result set will update removing the unwanted rows.
Output:
Refresh query (green circling arrows):
1) Next to workbook query itself
2) Or from ribbon > Refresh
1st version bellow (
AutoFilter
) is very fast - 2 seconds, for 100 K rowsTo change the row interval, update the
4
inConst FRM
(formula).
Version 1 - using an
AuroFilter
.
Version 2 - using a
For
loop.
I think the code above is a little dangerous and will require tweaking for a different total number of rows. You could modify as follows:
But you have to make sure you're starting at the right place and preserving the right cells. The Step -4 steps back four cells at a time, then the following line deletes row x, x-1, and x-2.
Why not just create a column that indicates whether the observation ends at an hour, then sort the list by that column and delete everything after the transition point? It's less automated, but also less likely to cause a problem.
If you really wanted to go the VBA route, I'd check in the code to ensure the observation is hourly and only then delete. I'm not big into trusting my data, though.